HumanRAM: Feed-forward Human Reconstruction and Animation Model using Transformers
Zhiyuan Yu, Zhe Li, Hujun Bao, Can Yang, Xiaowei Zhou

TL;DR
HumanRAM is a novel feed-forward transformer-based model that enables real-time, high-quality 3D human reconstruction and animation from monocular or sparse images, surpassing previous methods in accuracy and generalization.
Contribution
It introduces a unified framework integrating human reconstruction and animation using explicit pose conditions and transformers, enabling efficient and generalizable performance.
Findings
Outperforms previous methods in reconstruction accuracy
Achieves high-fidelity pose-controlled animation
Demonstrates strong generalization on real-world datasets
Abstract
3D human reconstruction and animation are long-standing topics in computer graphics and vision. However, existing methods typically rely on sophisticated dense-view capture and/or time-consuming per-subject optimization procedures. To address these limitations, we propose HumanRAM, a novel feed-forward approach for generalizable human reconstruction and animation from monocular or sparse human images. Our approach integrates human reconstruction and animation into a unified framework by introducing explicit pose conditions, parameterized by a shared SMPL-X neural texture, into transformer-based large reconstruction models (LRM). Given monocular or sparse input images with associated camera parameters and SMPL-X poses, our model employs scalable transformers and a DPT-based decoder to synthesize realistic human renderings under novel viewpoints and novel poses. By leveraging the explicit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
