AIT Asian Institute of Technology

1 AIT Asian Institute of Technology

> > >

Transformer-based aggregation for sequential face recognition
Author	Jednipat Moonrinta
Call Number	AIT Diss. no.CS-25-01
Subject(s)	Human face recognition (Computer science) Human-computer interaction Computer vision
Note	A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science
Publisher	Asian Institute of Technology
Abstract	In the past decade, advancements in face recognition have achieved accuracy levels com parable to human performance. The most effective models perform face embedding, in which a cropped face image is projected into a vector space in a way that embeddings of the same individual’s face under different imaging conditions tend to lie close together, and embeddings of different individuals’ faces tend to lie far apart. Face embedding models developed using machine learning have driven success in various real-world applications such as face verification for authentication. However, these systems typically require high-quality input images for both gallery and probe samples, an assumption that does not always hold in real-world scenarios. Variations in image quality, illumination, face pose, and occlusions can significantly affect performance. To address the challenge of noisy face embeddings caused by inconsistent image quality, we propose a deep learning-based approach that aggregates multiple face embeddings, either sequential or non-sequential, to construct a more robust face representation for verification.First, we develop a Transformer-based model that processes face embeddings extracted from a high-performance face feature extractor. The model takes a batch of embeddings as input and generates a refined face representation for verification.Second, we introduce an adaptive triplet loss function that extends the traditional triplet loss function for face embedding models by applying separate margin thresholds for positive and negative pairs. We evaluate both fixed margins and dynamically adjusted margins that adapt to the data on the fly. The adaptive triplet loss mitigates excessive penalization of differences between positive pairs and similarities between negative pairs.Third, we implement a two-stage training strategy comprising pre-training and fine tuning phases. In the pre-training phase, the model is trained using root mean squared loss to produce embeddings similar to those obtained via average pooling. In the fine tuning phase, the above-mentioned adaptive triplet loss is employed to further refine the model. In a series of experiments, we train our proposed model using this two-stage training approach on the YouTubeFaces dataset and evaluate its performance on multiple benchmark datasets, including IMFDB, CASIA-WebFace, CCVID, IJB-B, and IJB-C.The experimental results demonstrate that our method outperforms the baseline on most datasets. Our model generalizes well to unseen datasets and performs exceptionally well on the IMFDB dataset, where significant variations exist in face pose, lighting condi tions, and scene contexts within a single identity.Furthermore, to assess the model’s effectiveness in real-world deployment, we integrate our Transformer-based aggregation model into an elder care scenario using a telehealth dataset. In this setting, the model must verify individuals using video sequences captured in uncontrolled environments, with natural lighting, and high face pose variations. Our method shows improvement over the average pooling baseline. This confirms the practical utility of our approach for improving face verification reliability in challenging real-world conditions. We conclude that the methodology is an effective way to exploit the availability of multiple images of an individual when performing face verification under adverse conditions.
Year	2025
Type	Dissertation
School	School of Engineering and Technology
Department	Department of Information and Communications Technologies (DICT)
Academic Program/FoS	Computer Science (CS)
Chairperson(s)	Dailey, Mathew N.
Examination Committee(s)	Huynh, Trung Luong;Chaklam Silpasuwanchai;Adisorn Lertsinsrubtavee
Scholarship Donor(s)	Royal Thai Government;AIT Fellowship
Degree	Thesis (Ph.D.) - Asian Institute of Technology, 2025