IDOL: Instant Photorealistic 3D Human Creation from a Single Image

Yiyu Zhuang; Jiaxi Lv; Hao Wen; Qing Shuai; Ailing Zeng; Hao Zhu,; Shifeng Chen; Yujiu Yang; Xun Cao; Wei Liu

arXiv:2412.14963·cs.CV·March 26, 2025

IDOL: Instant Photorealistic 3D Human Creation from a Single Image

Yiyu Zhuang, Jiaxi Lv, Hao Wen, Qing Shuai, Ailing Zeng, Hao Zhu,, Shifeng Chen, Yujiu Yang, Xun Cao, Wei Liu

PDF

Open Access

TL;DR

IDOL introduces a scalable transformer-based method trained on a large synthetic dataset to instantly generate high-resolution, photorealistic 3D human avatars from a single image, enabling real-time animation and editing.

Contribution

The paper presents HuGe100K, a large-scale synthetic dataset, and a transformer model that predicts a 3D Gaussian representation for photorealistic human reconstruction from a single image.

Findings

01

Reconstructs 3D humans at 1K resolution instantly on a single GPU.

02

Demonstrates high-quality, photorealistic avatars with pose and shape editing capabilities.

03

Validates effectiveness through comprehensive experiments.

Abstract

Creating a high-fidelity, animatable 3D full-body avatar from a single image is a challenging task due to the diverse appearance and poses of humans and the limited availability of high-quality training data. To achieve fast and high-quality human reconstruction, this work rethinks the task from the perspectives of dataset, model, and representation. First, we introduce a large-scale HUman-centric GEnerated dataset, HuGe100K, consisting of 100K diverse, photorealistic sets of human images. Each set contains 24-view frames in specific human poses, generated using a pose-controllable image-to-multi-view model. Next, leveraging the diversity in views, poses, and appearances within HuGe100K, we develop a scalable feed-forward transformer model to predict a 3D human Gaussian representation in a uniform space from a given human image. This model is trained to disentangle human pose, body…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Human Motion and Animation

MethodsSparse Evolutionary Training