SyncMV4D: Synchronized Multi-view Joint Diffusion of Appearance and Motion for Hand-Object Interaction Synthesis

Lingwei Dang; Zonghan Li; Juntong Li; Hongwen Zhang; Liang An; Yebin Liu; Qingyao Wu

arXiv:2511.19319·cs.CV·March 9, 2026

SyncMV4D: Synchronized Multi-view Joint Diffusion of Appearance and Motion for Hand-Object Interaction Synthesis

Lingwei Dang, Zonghan Li, Juntong Li, Hongwen Zhang, Liang An, Yebin Liu, Qingyao Wu

PDF

Open Access

TL;DR

SyncMV4D introduces a novel multi-view joint diffusion framework that generates synchronized 3D-aware hand-object interaction videos and motions, overcoming limitations of single-view and high-quality data dependence.

Contribution

It is the first model to jointly generate synchronized multi-view HOI videos and 4D motions by unifying visual, motion, and multi-view geometry priors.

Findings

01

Outperforms state-of-the-art in realism and consistency

02

Generates plausible 4D motions from multi-view videos

03

Achieves high multi-view coherence in generated videos

Abstract

Hand-Object Interaction (HOI) generation plays a critical role in advancing applications across animation and robotics. Current video-based methods are predominantly single-view, which impedes comprehensive 3D geometry perception and often results in geometric distortions or unrealistic motion patterns. While 3D HOI approaches can generate dynamically plausible motions, their dependence on high-quality 3D data captured in controlled laboratory settings severely limits their generalization to real-world scenarios. To overcome these limitations, we introduce SyncMV4D, the first model that jointly generates synchronized multi-view HOI videos and 4D motions by unifying visual prior, motion dynamics, and multi-view geometry. Our framework features two core innovations: (1) a Multi-view Joint Diffusion (MJD) model that co-generates HOI videos and intermediate motions, and (2) a Diffusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Human Motion and Animation · Generative Adversarial Networks and Image Synthesis