SemanticFace: Semantic Facial Action Estimation via Semantic Distillation in Interpretable Space
Zejian Kang, Kai Zheng, Yuanchen Fei, Wentao Yang, Hongyuan Zou, Xiangru Huang

TL;DR
SemanticFace introduces a novel framework for facial action estimation that uses semantic distillation in an interpretable space, improving accuracy, interpretability, and robustness across diverse domains.
Contribution
It reformulates facial action estimation as structured semantic reasoning in an interpretable space using a two-stage distillation process with large language models.
Findings
Improves coefficient accuracy and perceptual consistency.
Enables strong cross-identity generalization.
Robust to domain shifts including cartoon faces.
Abstract
Facial action estimation from a single image is often formulated as predicting or fitting parameters in compact expression spaces, which lack explicit semantic interpretability. However, many practical applications, such as avatar control and human-computer interaction, require interpretable facial actions that correspond to meaningful muscle movements. In this work, we propose SemanticFace, a framework for facial action estimation in the interpretable ARKit blendshape space that reformulates coefficient prediction as structured semantic reasoning. SemanticFace adopts a two-stage semantic distillation paradigm: it first derives structured semantic supervision from ground-truth ARKit coefficients and then distills this knowledge into a multimodal large language model to predict interpretable facial action coefficients from images. Extensive experiments demonstrate that language-aligned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Emotion and Mood Recognition · Face Recognition and Perception
