LVFace: Progressive Cluster Optimization for Large Vision Models in Face Recognition
Jinghan You, Shanglin Li, Yuanrui Sun, Jiangchuan Wei, Mingyu Guo, Chao Feng, Jiao Ran

TL;DR
LVFace introduces a ViT-based face recognition model with Progressive Cluster Optimization, achieving state-of-the-art results and demonstrating scalability and robustness in large-scale and real-world scenarios.
Contribution
The paper presents LVFace, a novel ViT-based face recognition framework with PCO, improving performance and stability over CNN-inspired training methods.
Findings
LVFace surpasses leading face recognition models on multiple benchmarks.
LVFace achieves first place in the ICCV 2021 Masked Face Recognition Challenge.
LVFace demonstrates scalability and compatibility with mainstream vision and language models.
Abstract
Vision Transformers (ViTs) have revolutionized large-scale visual modeling, yet remain underexplored in face recognition (FR) where CNNs still dominate. We identify a critical bottleneck: CNN-inspired training paradigms fail to unlock ViT's potential, leading to suboptimal performance and convergence instability.To address this challenge, we propose LVFace, a ViT-based FR model that integrates Progressive Cluster Optimization (PCO) to achieve superior results. Specifically, PCO sequentially applies negative class sub-sampling (NCS) for robust and fast feature alignment from random initialization, feature expectation penalties for centroid stabilization, performing cluster boundary refinement through full-batch training without NCS constraints. LVFace establishes a new state-of-the-art face recognition baseline, surpassing leading approaches such as UniFace and TopoFR across multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis
