PaCo-FR: Patch-Pixel Aligned End-to-End Codebook Learning for Facial Representation Pre-training
Yin Xie, Zhichao Chen, Zeyu Xiao, Yongle Zhao, Xiang An, Kaicheng Yang, Zimin Ran, Jia Guo, Ziyong Feng, Jiankang Deng

TL;DR
PaCo-FR is an unsupervised facial representation pre-training method that combines structured masking, patch-based codebook, and spatial constraints to improve facial analysis tasks with limited labeled data.
Contribution
It introduces a novel framework integrating spatially coherent masking, a patch-based codebook, and spatial constraints for improved facial feature learning.
Findings
Achieves state-of-the-art results on facial analysis tasks.
Effective with only 2 million unlabeled images for pre-training.
Improves robustness to pose, occlusion, and lighting variations.
Abstract
Facial representation pre-training is crucial for tasks like facial recognition, expression analysis, and virtual reality. However, existing methods face three key challenges: (1) failing to capture distinct facial features and fine-grained semantics, (2) ignoring the spatial structure inherent to facial anatomy, and (3) inefficiently utilizing limited labeled data. To overcome these, we introduce PaCo-FR, an unsupervised framework that combines masked image modeling with patch-pixel alignment. Our approach integrates three innovative components: (1) a structured masking strategy that preserves spatial coherence by aligning with semantically meaningful facial regions, (2) a novel patch-based codebook that enhances feature discrimination with multiple candidate tokens, and (3) spatial consistency constraints that preserve geometric relationships between facial components. PaCo-FR…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
