Human Video Generation from a Single Image with 3D Pose and View Control
Tiantian Wang, Chun-Han Yao, Tao Hu, Mallikarjun Byrasandra Ramalinga Reddy, Ming-Hsuan Yang, Varun Jampani

TL;DR
This paper introduces HVG, a diffusion-based model that generates high-quality, multi-view, and temporally coherent human videos from a single image with 3D pose and view control, addressing key challenges in view consistency and clothing wrinkles.
Contribution
HVG is a novel latent video diffusion model that incorporates articulated pose modulation, view and temporal alignment, and progressive spatio-temporal sampling for improved human video synthesis.
Findings
HVG outperforms existing methods in quality and consistency.
It effectively generates multi-view, long-duration human videos.
The model maintains view and temporal coherence across frames.
Abstract
Recent diffusion methods have made significant progress in generating videos from single images due to their powerful visual generation capabilities. However, challenges persist in image-to-video synthesis, particularly in human video generation, where inferring view-consistent, motion-dependent clothing wrinkles from a single image remains a formidable problem. In this paper, we present Human Video Generation in 4D (HVG), a latent video diffusion model capable of generating high-quality, multi-view, spatiotemporally coherent human videos from a single image with 3D pose and view control. HVG achieves this through three key designs: (i) Articulated Pose Modulation, which captures the anatomical relationships of 3D joints via a novel dual-dimensional bone map and resolves self-occlusions across views by introducing 3D information; (ii) View and Temporal Alignment, which ensures…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · 3D Shape Modeling and Analysis
