SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios

Lingwei Dang; Ruizhi Shao; Hongwen Zhang; Wei Min; Yebin Liu; Qingyao Wu

arXiv:2506.02444·cs.CV·June 6, 2025

SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios

Lingwei Dang, Ruizhi Shao, Hongwen Zhang, Wei Min, Yebin Liu, Qingyao Wu

PDF

Open Access 1 Repo

TL;DR

SViMo introduces a synchronized diffusion framework that jointly generates high-fidelity hand-object interaction videos and 3D motions, overcoming limitations of prior methods by eliminating predefined models and enhancing physical plausibility.

Contribution

The paper presents a novel synchronized diffusion approach combining visual priors and dynamic constraints for joint video and motion generation in HOI scenarios, with a closed-loop feedback mechanism.

Findings

01

Outperforms state-of-the-art in video and motion quality

02

Demonstrates strong generalization to unseen scenarios

03

Produces physically plausible and consistent HOI sequences

Abstract

Hand-Object Interaction (HOI) generation has significant application potential. However, current 3D HOI motion generation approaches heavily rely on predefined 3D object models and lab-captured motion data, limiting generalization capabilities. Meanwhile, HOI video generation methods prioritize pixel-level visual fidelity, often sacrificing physical plausibility. Recognizing that visual appearance and motion patterns share fundamental physical laws in the real world, we propose a novel framework that combines visual priors and dynamic constraints within a synchronized diffusion process to generate the HOI video and motion simultaneously. To integrate the heterogeneous semantics, appearance, and motion features, our method implements tri-modal adaptive modulation for feature aligning, coupled with 3D full-attention for modeling inter- and intra-modal dependencies. Furthermore, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Droliven/SViMo_code
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · 3D Shape Modeling and Analysis

MethodsDiffusion