LiftAvatar: Kinematic-Space Completion for Expression-Controlled 3D Gaussian Avatar Animation

Hualiang Wei; Shunran Jia; Jialun Liu; Wenhui Li

arXiv:2603.02129·cs.CV·March 3, 2026

LiftAvatar: Kinematic-Space Completion for Expression-Controlled 3D Gaussian Avatar Animation

Hualiang Wei, Shunran Jia, Jialun Liu, Wenhui Li

PDF

Open Access

TL;DR

LiftAvatar is a novel method that completes sparse monocular observations in kinematic space to enable high-fidelity, expression-controlled 3D avatar animation with improved realism and consistency.

Contribution

It introduces a kinematic-space completion framework with multi-granularity control and multi-reference conditioning for enhanced 3D avatar animation.

Findings

01

Boosts animation quality and metrics of state-of-the-art methods

02

Enables expressive and consistent avatar animations from monocular videos

03

Addresses artifacts caused by sparse kinematic cues

Abstract

We present LiftAvatar, a new paradigm that completes sparse monocular observations in kinematic space (e.g., facial expressions and head pose) and uses the completed signals to drive high-fidelity avatar animation. LiftAvatar is a fine-grained, expression-controllable large-scale video diffusion Transformer that synthesizes high-quality, temporally coherent expression sequences conditioned on single or multiple reference images. The key idea is to lift incomplete input data into a richer kinematic representation, thereby strengthening both reconstruction and animation in downstream 3D avatar pipelines. To this end, we introduce (i) a multi-granularity expression control scheme that combines shading maps with expression coefficients for precise and stable driving, and (ii) a multi-reference conditioning mechanism that aggregates complementary cues from multiple frames, enabling strong 3D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Human Motion and Animation