LiftAvatar: Kinematic-Space Completion for Expression-Controlled 3D Gaussian Avatar Animation
Hualiang Wei, Shunran Jia, Jialun Liu, Wenhui Li

TL;DR
LiftAvatar is a novel method that completes sparse monocular observations in kinematic space to enable high-fidelity, expression-controlled 3D avatar animation with improved realism and consistency.
Contribution
It introduces a kinematic-space completion framework with multi-granularity control and multi-reference conditioning for enhanced 3D avatar animation.
Findings
Boosts animation quality and metrics of state-of-the-art methods
Enables expressive and consistent avatar animations from monocular videos
Addresses artifacts caused by sparse kinematic cues
Abstract
We present LiftAvatar, a new paradigm that completes sparse monocular observations in kinematic space (e.g., facial expressions and head pose) and uses the completed signals to drive high-fidelity avatar animation. LiftAvatar is a fine-grained, expression-controllable large-scale video diffusion Transformer that synthesizes high-quality, temporally coherent expression sequences conditioned on single or multiple reference images. The key idea is to lift incomplete input data into a richer kinematic representation, thereby strengthening both reconstruction and animation in downstream 3D avatar pipelines. To this end, we introduce (i) a multi-granularity expression control scheme that combines shading maps with expression coefficients for precise and stable driving, and (ii) a multi-reference conditioning mechanism that aggregates complementary cues from multiple frames, enabling strong 3D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Human Motion and Animation
