1 AIT Asian Institute of Technology

Transformer-based aggregation for sequential face recognition

AuthorJednipat Moonrinta
Call NumberAIT Diss. no.CS-25-01
Subject(s)Human face recognition (Computer science)
Human-computer interaction
Computer vision
NoteA dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science
PublisherAsian Institute of Technology
AbstractIn the past decade, advancements in face recognition have achieved accuracy levels com parable to human performance. The most effective models perform face embedding, in which a cropped face image is projected into a vector space in a way that embeddings of the same individual’s face under different imaging conditions tend to lie close together, and embeddings of different individuals’ faces tend to lie far apart. Face embedding models developed using machine learning have driven success in various real-world applications such as face verification for authentication. However, these systems typically require high-quality input images for both gallery and probe samples, an assumption that does not always hold in real-world scenarios. Variations in image quality, illumination, face pose, and occlusions can significantly affect performance. To address the challenge of noisy face embeddings caused by inconsistent image quality, we propose a deep learning-based approach that aggregates multiple face embeddings, either sequential or non-sequential, to construct a more robust face representation for verification.First, we develop a Transformer-based model that processes face embeddings extracted from a high-performance face feature extractor. The model takes a batch of embeddings as input and generates a refined face representation for verification.Second, we introduce an adaptive triplet loss function that extends the traditional triplet loss function for face embedding models by applying separate margin thresholds for positive and negative pairs. We evaluate both fixed margins and dynamically adjusted margins that adapt to the data on the fly. The adaptive triplet loss mitigates excessive penalization of differences between positive pairs and similarities between negative pairs.Third, we implement a two-stage training strategy comprising pre-training and fine tuning phases. In the pre-training phase, the model is trained using root mean squared loss to produce embeddings similar to those obtained via average pooling. In the fine tuning phase, the above-mentioned adaptive triplet loss is employed to further refine the model. In a series of experiments, we train our proposed model using this two-stage training approach on the YouTubeFaces dataset and evaluate its performance on multiple benchmark datasets, including IMFDB, CASIA-WebFace, CCVID, IJB-B, and IJB-C.The experimental results demonstrate that our method outperforms the baseline on most datasets. Our model generalizes well to unseen datasets and performs exceptionally well on the IMFDB dataset, where significant variations exist in face pose, lighting condi tions, and scene contexts within a single identity.Furthermore, to assess the model’s effectiveness in real-world deployment, we integrate our Transformer-based aggregation model into an elder care scenario using a telehealth dataset. In this setting, the model must verify individuals using video sequences captured in uncontrolled environments, with natural lighting, and high face pose variations. Our method shows improvement over the average pooling baseline. This confirms the practical utility of our approach for improving face verification reliability in challenging real-world conditions. We conclude that the methodology is an effective way to exploit the availability of multiple images of an individual when performing face verification under adverse conditions.
Year2025
TypeDissertation
SchoolSchool of Engineering and Technology
DepartmentDepartment of Information and Communications Technologies (DICT)
Academic Program/FoSComputer Science (CS)
Chairperson(s)Dailey, Mathew N.
Examination Committee(s)Huynh, Trung Luong;Chaklam Silpasuwanchai;Adisorn Lertsinsrubtavee
Scholarship Donor(s)Royal Thai Government;AIT Fellowship
DegreeThesis (Ph.D.) - Asian Institute of Technology, 2025


Usage Metrics
View Detail0
Read PDF0
Download PDF0