HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion
Xian Liu, Jian Ren, Aliaksandr Siarohin, Ivan Skorokhodov, Yanyu Li,, Dahua Lin, Xihui Liu, Ziwei Liu, Sergey Tulyakov

TL;DR
HyperHuman introduces a unified diffusion-based framework that leverages large-scale human-centric data and structural modeling to generate hyper-realistic, diverse human images with coherent poses and detailed geometry.
Contribution
The paper presents a novel Latent Structural Diffusion Model and a large-scale HumanVerse dataset for improved human image synthesis.
Findings
Achieves state-of-the-art realism in human image generation
Effectively models structural and appearance correlations
Produces diverse human images with coherent poses
Abstract
Despite significant advances in large-scale text-to-image models, achieving hyper-realistic human image generation remains a desirable yet unsolved task. Existing models like Stable Diffusion and DALL-E 2 tend to generate human images with incoherent parts or unnatural poses. To tackle these challenges, our key insight is that human image is inherently structural over multiple granularities, from the coarse-level body skeleton to fine-grained spatial geometry. Therefore, capturing such correlations between the explicit appearance and latent structure in one model is essential to generate coherent and natural human images. To this end, we propose a unified framework, HyperHuman, that generates in-the-wild human images of high realism and diverse layouts. Specifically, 1) we first build a large-scale human-centric dataset, named HumanVerse, which consists of 340M images with comprehensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis
MethodsDiffusion
