SCE-MAE: Selective Correspondence Enhancement with Masked Autoencoder for Self-Supervised Landmark Estimation
Kejia Yin, Varshanth R. Rao, Ruowei Jiang, Xudong Liu, Parham Aarabi,, David B. Lindell

TL;DR
SCE-MAE introduces a self-supervised framework using masked autoencoders and a novel correspondence refinement method to improve facial landmark estimation without annotated data.
Contribution
The paper proposes SCE-MAE, a new self-supervised landmark estimation method that leverages masked autoencoders and a local correspondence refinement strategy.
Findings
Outperforms state-of-the-art methods by 20-44% in landmark matching.
Achieves 9-15% improvement in landmark detection accuracy.
Demonstrates robustness and effectiveness across extensive experiments.
Abstract
Self-supervised landmark estimation is a challenging task that demands the formation of locally distinct feature representations to identify sparse facial landmarks in the absence of annotated data. To tackle this task, existing state-of-the-art (SOTA) methods (1) extract coarse features from backbones that are trained with instance-level self-supervised learning (SSL) paradigms, which neglect the dense prediction nature of the task, (2) aggregate them into memory-intensive hypercolumn formations, and (3) supervise lightweight projector networks to naively establish full local correspondences among all pairs of spatial features. In this paper, we introduce SCE-MAE, a framework that (1) leverages the MAE, a region-level SSL method that naturally better suits the landmark prediction task, (2) operates on the vanilla feature map instead of on expensive hypercolumns, and (3) employs a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
MethodsMasked autoencoder
