TL;DR
SwinFace is a multi-task transformer model that simultaneously performs face recognition, expression recognition, age estimation, and attribute prediction, improving accuracy and efficiency through shared features and task-specific modules.
Contribution
The paper introduces SwinFace, a unified multi-task transformer architecture with a novel Multi-Level Channel Attention module for improved face analysis performance.
Findings
Achieves state-of-the-art accuracy on facial expression recognition.
Attains 0.22 ε-error on age estimation benchmark.
Demonstrates superior multi-task learning capabilities.
Abstract
In recent years, vision transformers have been introduced into face recognition and analysis and have achieved performance breakthroughs. However, most previous methods generally train a single model or an ensemble of models to perform the desired task, which ignores the synergy among different tasks and fails to achieve improved prediction accuracy, increased data efficiency, and reduced training time. This paper presents a multi-purpose algorithm for simultaneous face recognition, facial expression recognition, age estimation, and face attribute estimation (40 attributes including gender) based on a single Swin Transformer. Our design, the SwinFace, consists of a single shared backbone together with a subnet for each set of related tasks. To address the conflicts among multiple tasks and meet the different demands of tasks, a Multi-Level Channel Attention (MLCA) module is integrated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Stochastic Depth · Linear Layer · Layer Normalization · Dense Connections · Absolute Position Encodings
