Precise Facial Landmark Detection by Dynamic Semantic Aggregation Transformer
Jun Wan, He Liu, Yujia Wu, Zhihui Lai, Wenwen Min, Jun, Liu

TL;DR
This paper introduces a Dynamic Semantic-Aggregation Transformer (DSAT) that learns specialized features for face alignment, effectively handling challenging cases like large poses and occlusions, and outperforms existing methods.
Contribution
It proposes a novel DSAT framework combining dynamic semantic-aware partitioning and semantic specialization to improve face landmark detection accuracy.
Findings
Outperforms state-of-the-art face alignment models on popular datasets.
Effectively handles large pose variations and occlusions.
Harder samples benefit from activated feature channels.
Abstract
At present, deep neural network methods have played a dominant role in face alignment field. However, they generally use predefined network structures to predict landmarks, which tends to learn general features and leads to mediocre performance, e.g., they perform well on neutral samples but struggle with faces exhibiting large poses or occlusions. Moreover, they cannot effectively deal with semantic gaps and ambiguities among features at different scales, which may hinder them from learning efficient features. To address the above issues, in this paper, we propose a Dynamic Semantic-Aggregation Transformer (DSAT) for more discriminative and representative feature (i.e., specialized feature) learning. Specifically, a Dynamic Semantic-Aware (DSA) model is first proposed to partition samples into subsets and activate the specific pathways for them by estimating the semantic correlations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Face and Expression Recognition
MethodsAttention Is All You Need · Absolute Position Encodings · Residual Connection · Adam · Softmax · Label Smoothing · Dropout · Dense Connections · Layer Normalization · Linear Layer
