DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation

Hyeonwoo Kim; Jeonghwan Kim; Kyungwon Cho; Hanbyul Joo

arXiv:2604.20841·cs.CV·April 23, 2026

DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation

Hyeonwoo Kim, Jeonghwan Kim, Kyungwon Cho, Hanbyul Joo

PDF

1 Repo

TL;DR

DeVI is a framework that uses synthetic, text-conditioned videos to enable physically plausible dexterous human-object interactions in robotics, overcoming the limitations of purely 2D generative videos.

Contribution

DeVI introduces a hybrid tracking reward and zero-shot generalization, enabling dexterous manipulation control using synthetic videos without requiring 3D demonstrations.

Findings

01

DeVI outperforms existing imitation methods in dexterous hand-object interactions.

02

It effectively generalizes to unseen objects and interaction types.

03

DeVI demonstrates success in multi-object scenes and diverse, text-driven actions.

Abstract

Recent advances in video generative models enable the synthesis of realistic human-object interaction videos across a wide range of scenarios and object categories, including complex dexterous manipulations that are difficult to capture with motion capture systems. While the rich interaction knowledge embedded in these synthetic videos holds strong potential for motion planning in dexterous robotic manipulation, their limited physical fidelity and purely 2D nature make them difficult to use directly as imitation targets in physics-based character control. We present DeVI (Dexterous Video Imitation), a novel framework that leverages text-conditioned synthetic videos to enable physically plausible dexterous agent control for interacting with unseen target objects. To overcome the imprecision of generative 2D cues, we introduce a hybrid tracking reward that integrates 3D human tracking…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

snuvclab/devi
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.