MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware   Diffusion and Iterative Refinement

Xu He; Xiaoyu Li; Di Kang; Jiangnan Ye; Chaopeng Zhang and; Liyang Chen; Xiangjun Gao; Han Zhang; Zhiyong Wu; Haolin Zhuang

arXiv:2408.14211·cs.CV·August 27, 2024

MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement

Xu He, Xiaoyu Li, Di Kang, Jiangnan Ye, Chaopeng Zhang and, Liyang Chen, Xiangjun Gao, Han Zhang, Zhiyong Wu, Haolin Zhuang

PDF

Open Access

TL;DR

MagicMan is a novel human view synthesis method that combines 3D-aware diffusion, multi-view attention, and iterative refinement to produce consistent, high-quality multi-view images from a single reference image, improving 3D reconstruction.

Contribution

It introduces a multi-view diffusion model with hybrid attention, geometry-aware dual branches, and iterative refinement for better 3D human synthesis from a single image.

Findings

01

Outperforms existing methods in novel view synthesis.

02

Enhances 3D human reconstruction accuracy.

03

Achieves high consistency across generated views.

Abstract

Existing works in single-image human reconstruction suffer from weak generalizability due to insufficient training data or 3D inconsistencies for a lack of comprehensive multi-view knowledge. In this paper, we introduce MagicMan, a human-specific multi-view diffusion model designed to generate high-quality novel view images from a single reference image. As its core, we leverage a pre-trained 2D diffusion model as the generative prior for generalizability, with the parametric SMPL-X model as the 3D body prior to promote 3D awareness. To tackle the critical challenge of maintaining consistency while achieving dense multi-view generation for improved 3D human reconstruction, we first introduce hybrid multi-view attention to facilitate both efficient and thorough information interchange across different views. Additionally, we present a geometry-aware dual branch to perform concurrent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Advanced Vision and Imaging

MethodsSoftmax · Attention Is All You Need · Diffusion