1 AIT Asian Institute of Technology

Multi-modal person retrieval : bridging text, images, and re-identification

AuthorAti Tesakulsiri
Call NumberAIT Thesis no.DSAI-24-04
Subject(s)Pattern recognition systems
Computer vision

NoteA thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Data Science and Artificial Intelligence
PublisherAsian Institute of Technology
AbstractPerson Re-Identification (Re-ID) attempts to recognize the same subject from many cameras even when there are changes in lighting, posture, and point of view. Motivated by the remarkable outcomes of CLIP (Contrastive Language-Image Pre-training), we in vestigate the possibility of utilizing Contrastive Learning in conjunction with a blend of text encoders and Re-ID models. With Lock image text tuning method, the resource and time needed for training the model are not the issue. With 5X less time for training and 3X- 18X less resources consumption (GPU ram), we able to train the model to learn an attribute of the person with 83.98% of 4/5 matches accuracy (2616 classes) on unseen dataset which higher than current SOTA model on Person retrieval. We also leverage multilingual retrieval in this study. However, This network combination is un able to learn a view, posture and changes in lighting make this approach lack behind the person retrieval benchmarks.
Year2024
TypeThesis
SchoolSchool of Engineering and Technology
DepartmentDepartment of Information and Communications Technologies (DICT)
Academic Program/FoSData Science and Artificial Intelligence (DSAI)
Chairperson(s)Mongkol Ekpanyapong;
Examination Committee(s)Huynh, Trung Luong;Chaklam Silpasuwanchai;
Scholarship Donor(s)Royal Thai Government Fellowship;
DegreeThesis (M. Sc.) - Asian Institute of Technology, 2024


Usage Metrics
View Detail0
Read PDF0
Download PDF0