SemanticFace: Semantic Facial Action Estimation via Semantic Distillation in Interpretable Space

Zejian Kang; Kai Zheng; Yuanchen Fei; Wentao Yang; Hongyuan Zou; Xiangru Huang

arXiv:2603.14827·cs.CV·March 19, 2026

SemanticFace: Semantic Facial Action Estimation via Semantic Distillation in Interpretable Space

Zejian Kang, Kai Zheng, Yuanchen Fei, Wentao Yang, Hongyuan Zou, Xiangru Huang

PDF

Open Access

TL;DR

SemanticFace introduces a novel framework for facial action estimation that uses semantic distillation in an interpretable space, improving accuracy, interpretability, and robustness across diverse domains.

Contribution

It reformulates facial action estimation as structured semantic reasoning in an interpretable space using a two-stage distillation process with large language models.

Findings

01

Improves coefficient accuracy and perceptual consistency.

02

Enables strong cross-identity generalization.

03

Robust to domain shifts including cartoon faces.

Abstract

Facial action estimation from a single image is often formulated as predicting or fitting parameters in compact expression spaces, which lack explicit semantic interpretability. However, many practical applications, such as avatar control and human-computer interaction, require interpretable facial actions that correspond to meaningful muscle movements. In this work, we propose SemanticFace, a framework for facial action estimation in the interpretable ARKit blendshape space that reformulates coefficient prediction as structured semantic reasoning. SemanticFace adopts a two-stage semantic distillation paradigm: it first derives structured semantic supervision from ground-truth ARKit coefficients and then distills this knowledge into a multimodal large language model to predict interpretable facial action coefficients from images. Extensive experiments demonstrate that language-aligned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Emotion and Mood Recognition · Face Recognition and Perception