CoMoVi: Co-Generation of 3D Human Motions and Realistic Videos

Chengfeng Zhao; Jiazhi Shu; Yubo Zhao; Tianyu Huang; Jiahao Lu; Zekai Gu; Chengwei Ren; Zhiyang Dou; Qing Shuai; Yuan Liu

arXiv:2601.10632·cs.CV·April 13, 2026

CoMoVi: Co-Generation of 3D Human Motions and Realistic Videos

Chengfeng Zhao, Jiazhi Shu, Yubo Zhao, Tianyu Huang, Jiahao Lu, Zekai Gu, Chengwei Ren, Zhiyang Dou, Qing Shuai, Yuan Liu

PDF

1 Models

TL;DR

CoMoVi introduces a unified diffusion-based framework that synchronously generates 3D human motions and realistic videos by aligning 3D motions with 2D representations and employing dual-branch diffusion models.

Contribution

The paper proposes a novel co-generation framework with a dual-branch diffusion model and a new dataset, enabling high-quality, synchronized 3D motion and video generation.

Findings

01

Generated 3D human motions with improved generalization.

02

Produced high-quality human-centric videos without external motion references.

03

Curated the large-scale CoMoVi-Dataset for training and evaluation.

Abstract

In this paper, we find that the generation of 3D human motions and 2D human videos is intrinsically coupled. 3D motions provide the structural prior for plausibility and consistency in videos, while pre-trained video models offer strong generalization capabilities for motions. Based on this, we present CoMoVi, a co-generative framework that generates 3D human motions and videos synchronously within a single diffusion denoising loop. However, since the 3D human motions and the 2D human-centric videos have a modality gap between each other, we propose to project the 3D human motion into an effective 2D human motion representation that effectively aligns with the 2D videos. Then, we design a dual-branch diffusion model to couple human motion and the video generation process with mutual feature interaction and 3D-2D cross attentions. To train and evaluate our model, we curate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
AfterJourney/CoMoVi
model· 22 dl· ♡ 1
22 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.