Human4DiT: 360-degree Human Video Generation with 4D Diffusion Transformer
Ruizhi Shao, Youxin Pang, Zerong Zheng, Jingxiang Sun, Yebin Liu

TL;DR
This paper introduces Human4DiT, a 4D diffusion transformer framework that generates high-quality, 360-degree human videos from a single image, capturing complex motions and viewpoints with global coherence.
Contribution
The paper proposes a hierarchical 4D transformer architecture combining diffusion models and CNNs for efficient, coherent 360-degree human video synthesis from limited input data.
Findings
Successfully generates realistic 360-degree human videos
Outperforms previous GAN and diffusion-based methods in motion complexity and viewpoint variation
Demonstrates potential for VR and animation applications
Abstract
We present a novel approach for generating 360-degree high-quality, spatio-temporally coherent human videos from a single image. Our framework combines the strengths of diffusion transformers for capturing global correlations across viewpoints and time, and CNNs for accurate condition injection. The core is a hierarchical 4D transformer architecture that factorizes self-attention across views, time steps, and spatial dimensions, enabling efficient modeling of the 4D space. Precise conditioning is achieved by injecting human identity, camera parameters, and temporal signals into the respective transformers. To train this model, we collect a multi-dimensional dataset spanning images, videos, multi-view data, and limited 4D footage, along with a tailored multi-dimensional training strategy. Our approach overcomes the limitations of previous methods based on generative adversarial networks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Human Motion and Animation · Generative Adversarial Networks and Image Synthesis
MethodsDiffusion
