1
Multi-modal person retrieval : bridging text, images, and re-identification | |
| Author | Ati Tesakulsiri |
| Call Number | AIT Thesis no.DSAI-24-04 |
| Subject(s) | Pattern recognition systems Computer vision |
| Note | A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Data Science and Artificial Intelligence |
| Publisher | Asian Institute of Technology |
| Abstract | Person Re-Identification (Re-ID) attempts to recognize the same subject from many cameras even when there are changes in lighting, posture, and point of view. Motivated by the remarkable outcomes of CLIP (Contrastive Language-Image Pre-training), we in vestigate the possibility of utilizing Contrastive Learning in conjunction with a blend of text encoders and Re-ID models. With Lock image text tuning method, the resource and time needed for training the model are not the issue. With 5X less time for training and 3X- 18X less resources consumption (GPU ram), we able to train the model to learn an attribute of the person with 83.98% of 4/5 matches accuracy (2616 classes) on unseen dataset which higher than current SOTA model on Person retrieval. We also leverage multilingual retrieval in this study. However, This network combination is un able to learn a view, posture and changes in lighting make this approach lack behind the person retrieval benchmarks. |
| Year | 2024 |
| Type | Thesis |
| School | School of Engineering and Technology |
| Department | Department of Information and Communications Technologies (DICT) |
| Academic Program/FoS | Data Science and Artificial Intelligence (DSAI) |
| Chairperson(s) | Mongkol Ekpanyapong; |
| Examination Committee(s) | Huynh, Trung Luong;Chaklam Silpasuwanchai; |
| Scholarship Donor(s) | Royal Thai Government Fellowship; |
| Degree | Thesis (M. Sc.) - Asian Institute of Technology, 2024 |