ORFormer: Occlusion-Robust Transformer for Accurate Facial Landmark Detection
Jui-Che Chiang, Hou-Ning Hu, Bo-Syuan Hou, Chia-Yu Tseng, Yu-Lun Liu,, Min-Hung Chen, Yen-Yu Lin

TL;DR
ORFormer is a transformer-based facial landmark detection method that effectively identifies and recovers occluded facial regions, resulting in more accurate landmark localization under challenging conditions.
Contribution
The paper introduces ORFormer, a novel transformer architecture with messenger tokens that detect and recover occluded facial regions, improving landmark detection robustness.
Findings
Outperforms state-of-the-art on WFLW and COFW datasets
Produces heatmaps resilient to partial occlusions
Enhances existing FLD methods with recovered features
Abstract
Although facial landmark detection (FLD) has gained significant progress, existing FLD methods still suffer from performance drops on partially non-visible faces, such as faces with occlusions or under extreme lighting conditions or poses. To address this issue, we introduce ORFormer, a novel transformer-based method that can detect non-visible regions and recover their missing features from visible parts. Specifically, ORFormer associates each image patch token with one additional learnable token called the messenger token. The messenger token aggregates features from all but its patch. This way, the consensus between a patch and other patches can be assessed by referring to the similarity between its regular and messenger embeddings, enabling non-visible region identification. Our method then recovers occluded patches with features aggregated by the messenger tokens. Leveraging the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Face and Expression Recognition
