Dream, Lift, Animate: From Single Images to Animatable Gaussian Avatars
Marcel C. B\"uhler, Ye Yuan, Xueting Li, Yangyi Huang, Koki Nagano, Umar Iqbal

TL;DR
This paper presents Dream, Lift, Animate (DLA), a framework that reconstructs and animates 3D human avatars from a single image by leveraging multi-view generation, 3D Gaussian lifting, and UV-space mapping, enabling real-time rendering and editing.
Contribution
The novel DLA framework combines diffusion-based multi-view synthesis with 3D Gaussian lifting and UV-space mapping for high-fidelity, animatable 3D avatars from a single image.
Findings
Outperforms state-of-the-art on ActorsHQ and 4D-Dress datasets.
Enables real-time rendering and editing of avatars.
Preserves fine visual details during animation.
Abstract
We introduce Dream, Lift, Animate (DLA), a novel framework that reconstructs animatable 3D human avatars from a single image. This is achieved by leveraging multi-view generation, 3D Gaussian lifting, and pose-aware UV-space mapping of 3D Gaussians. Given an image, we first dream plausible multi-views using a video diffusion model, capturing rich geometric and appearance details. These views are then lifted into unstructured 3D Gaussians. To enable animation, we propose a transformer-based encoder that models global spatial relationships and projects these Gaussians into a structured latent representation aligned with the UV space of a parametric body model. This latent code is decoded into UV-space Gaussians that can be animated via body-driven deformation and rendered conditioned on pose and viewpoint. By anchoring Gaussians to the UV manifold, our method ensures consistency during…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArchitecture and Computational Design
