Precise Facial Landmark Detection by Dynamic Semantic Aggregation   Transformer

Jun Wan; He Liu; Yujia Wu; Zhihui Lai; Wenwen Min; Jun; Liu

arXiv:2412.00740·cs.CV·December 3, 2024

Precise Facial Landmark Detection by Dynamic Semantic Aggregation Transformer

Jun Wan, He Liu, Yujia Wu, Zhihui Lai, Wenwen Min, Jun, Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Dynamic Semantic-Aggregation Transformer (DSAT) that learns specialized features for face alignment, effectively handling challenging cases like large poses and occlusions, and outperforms existing methods.

Contribution

It proposes a novel DSAT framework combining dynamic semantic-aware partitioning and semantic specialization to improve face landmark detection accuracy.

Findings

01

Outperforms state-of-the-art face alignment models on popular datasets.

02

Effectively handles large pose variations and occlusions.

03

Harder samples benefit from activated feature channels.

Abstract

At present, deep neural network methods have played a dominant role in face alignment field. However, they generally use predefined network structures to predict landmarks, which tends to learn general features and leads to mediocre performance, e.g., they perform well on neutral samples but struggle with faces exhibiting large poses or occlusions. Moreover, they cannot effectively deal with semantic gaps and ambiguities among features at different scales, which may hinder them from learning efficient features. To address the above issues, in this paper, we propose a Dynamic Semantic-Aggregation Transformer (DSAT) for more discriminative and representative feature (i.e., specialized feature) learning. Specifically, a Dynamic Semantic-Aware (DSA) model is first proposed to partition samples into subsets and activate the specific pathways for them by estimating the semantic correlations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

germino-liuhe/dsat
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Face and Expression Recognition

MethodsAttention Is All You Need · Absolute Position Encodings · Residual Connection · Adam · Softmax · Label Smoothing · Dropout · Dense Connections · Layer Normalization · Linear Layer