Learning to Cluster Faces via Transformer
Jinxing Ye, Xioajiang Peng, Baigui Sun, Kai Wang, Xiuyu Sun, Hao Li,, Hanqing Wu

TL;DR
This paper introduces a Face Transformer model that improves face clustering accuracy by leveraging local context and relation encoding, achieving state-of-the-art results on benchmark datasets.
Contribution
The paper proposes a novel Face Transformer architecture that decomposes face clustering into relation encoding and linkage prediction, enhancing robustness and accuracy.
Findings
Achieves 91.12% pairwise F-score on MS-Celeb-1M
Outperforms existing methods on face clustering benchmarks
Demonstrates robustness to pose, occlusion, and image quality variations
Abstract
Face clustering is a useful tool for applications like automatic face annotation and retrieval. The main challenge is that it is difficult to cluster images from the same identity with different face poses, occlusions, and image quality. Traditional clustering methods usually ignore the relationship between individual images and their neighbors which may contain useful context information. In this paper, we repurpose the well-known Transformer and introduce a Face Transformer for supervised face clustering. In Face Transformer, we decompose the face clustering into two steps: relation encoding and linkage predicting. Specifically, given a face image, a \textbf{relation encoder} module aggregates local context information from its neighbors and a \textbf{linkage predictor} module judges whether a pair of images belong to the same cluster or not. In the local linkage graph view, Face…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Face and Expression Recognition · Biometric Identification and Security
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Softmax · Layer Normalization · Residual Connection · Byte Pair Encoding · Adam
