GUAVA: Generalizable Upper Body 3D Gaussian Avatar
Dongbin Zhang, Yunfei Liu, Lijian Lin, Ye Zhu, Yang Li, Minghan Qin, Yu Li, Haoqian Wang

TL;DR
GUAVA is a novel framework that reconstructs high-quality, animatable 3D upper-body avatars from a single image quickly, surpassing previous methods in speed and quality, and enabling real-time animation.
Contribution
It introduces an expressive human model and a fast reconstruction framework for animatable upper-body 3D Gaussian avatars from a single image, with significant speed and quality improvements.
Findings
Reconstruction time is reduced to 0.1 seconds.
Outperforms previous methods in rendering quality.
Supports real-time animation and rendering.
Abstract
Reconstructing a high-quality, animatable 3D human avatar with expressive facial and hand motions from a single image has gained significant attention due to its broad application potential. 3D human avatar reconstruction typically requires multi-view or monocular videos and training on individual IDs, which is both complex and time-consuming. Furthermore, limited by SMPLX's expressiveness, these methods often focus on body motion but struggle with facial expressions. To address these challenges, we first introduce an expressive human model (EHM) to enhance facial expression capabilities and develop an accurate tracking method. Based on this template model, we propose GUAVA, the first framework for fast animatable upper-body 3D Gaussian avatar reconstruction. We leverage inverse texture mapping and projection sampling techniques to infer Ubody (upper-body) Gaussians from a single image.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Context-Aware Activity Recognition Systems · Balance, Gait, and Falls Prevention
MethodsSoftmax · Attention Is All You Need · Focus · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
