FlexAvatar: Flexible Large Reconstruction Model for Animatable Gaussian Head Avatars with Detailed Deformation

Cheng Peng; Zhuo Su; Liao Wang; Chen Guo; Zhaohu Li; Chengjiang Long; Zheng Lv; Jingxiang Sun; Chenyangguang Zhang; Yebin Liu

arXiv:2512.17717·cs.CV·December 22, 2025

FlexAvatar: Flexible Large Reconstruction Model for Animatable Gaussian Head Avatars with Detailed Deformation

Cheng Peng, Zhuo Su, Liao Wang, Chen Guo, Zhaohu Li, Chengjiang Long, Zheng Lv, Jingxiang Sun, Chenyangguang Zhang, Yebin Liu

PDF

Open Access

TL;DR

FlexAvatar introduces a novel transformer-based model for high-fidelity 3D head avatars that can be reconstructed from sparse images, capturing detailed expressions and deformations in real time without requiring camera or expression labels.

Contribution

It proposes a flexible reconstruction framework combining a transformer model with a UV-conditioned decoder, enabling detailed, expression-dependent deformations from minimal input data.

Findings

01

Achieves superior 3D consistency and realism in dynamic head avatars.

02

Capable of real-time detailed expression-dependent deformation.

03

Enhances identity-specific details with a quick refinement process.

Abstract

We present FlexAvatar, a flexible large reconstruction model for high-fidelity 3D head avatars with detailed dynamic deformation from single or sparse images, without requiring camera poses or expression labels. It leverages a transformer-based reconstruction model with structured head query tokens as canonical anchor to aggregate flexible input-number-agnostic, camera-pose-free and expression-free inputs into a robust canonical 3D representation. For detailed dynamic deformation, we introduce a lightweight UNet decoder conditioned on UV-space position maps, which can produce detailed expression-dependent deformations in real time. To better capture rare but critical expressions like wrinkles and bared teeth, we also adopt a data distribution adjustment strategy during training to balance the distribution of these expressions in the training set. Moreover, a lightweight 10-second…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · 3D Shape Modeling and Analysis · Human Motion and Animation