Tiny is not small enough: High-quality, low-resource facial animation models through hybrid knowledge distillation

Zhen Han; Mattias Teye; Derek Yadgaroff; Judith B\"utepage

arXiv:2507.18352·cs.GR·February 13, 2026

Tiny is not small enough: High-quality, low-resource facial animation models through hybrid knowledge distillation

Zhen Han, Mattias Teye, Derek Yadgaroff, Judith B\"utepage

PDF

TL;DR

This paper presents a method for creating small, high-quality, real-time facial animation models suitable for on-device use in games, using hybrid knowledge distillation to overcome dataset limitations.

Contribution

The authors introduce a hybrid knowledge distillation approach with pseudo-labeling to develop tiny, efficient facial animation models that maintain quality while enabling real-time on-device inference.

Findings

01

Memory footprint reduced to 3.4 MB

02

Achieved up to 81 ms audio context requirement

03

Maintained high-quality facial animations

Abstract

The training of high-quality, robust machine learning models for speech-driven 3D facial animation requires a large, diverse dataset of high-quality audio-animation pairs. To overcome the lack of such a dataset, recent work has introduced large pre-trained speech encoders that are robust to variations in the input audio and, therefore, enable the facial animation model to generalize across speakers, audio quality, and languages. However, the resulting facial animation models are prohibitively large and lend themselves only to offline inference on a dedicated machine. In this work, we explore on-device, real-time facial animation models in the context of game development. We overcome the lack of large datasets by using hybrid knowledge distillation with pseudo-labeling. Given a large audio dataset, we employ a high-performing teacher model to train very small student models. In contrast…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.