SuperFace: Preference-Aligned Facial Expression Estimation Beyond Pseudo Supervision
Zejian Kang, Xuanyang Xu, Wentao Yang, Kai Zheng, Yuanchen Fei, Hongyuan Zou, Hui Shan, Shuo Yang, Xiangru Huang

TL;DR
SuperFace introduces a human preference-driven approach to improve ARKit facial expression estimation, surpassing pseudo-label supervision by optimizing for perceptual expression fidelity.
Contribution
It proposes a novel framework that moves beyond pseudo-labels, using human preferences to enhance the accuracy and realism of facial expression predictions.
Findings
SuperFace outperforms models trained with pseudo labels in expression fidelity.
Preference-driven optimization leads to more visually faithful facial animations.
The method effectively aligns predictions with human perceptual judgments.
Abstract
Accurate facial estimation is crucial for realistic digital human animation, and ARKit blendshape coefficients offer an interpretable representation by mapping facial motions to semantic animation controls. However, learning high-quality ARKit coefficient prediction remains limited by the absence of reliable ground-truth supervision. Existing methods typically rely on capture software such as Live Link Face to provide pseudo labels, which may contain noisy activations, biased coefficient magnitudes, and missing or inaccurate facial actions. Consequently, models trained with supervised learning tend to reproduce imperfect pseudo labels rather than optimize for perceptual expression fidelity. In this paper, we propose SuperFace, a preference-driven framework that moves ARKit facial expression estimation from pseudo-label imitation toward human-aligned perceptual optimization. Instead of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
