VHOI: Controllable Video Generation of Human-Object Interactions from Sparse Trajectories via Motion Densification

Wanyue Zhang; Lin Geng Foo; Thabo Beeler; Rishabh Dabral; Christian Theobalt

arXiv:2512.09646·cs.CV·April 9, 2026

VHOI: Controllable Video Generation of Human-Object Interactions from Sparse Trajectories via Motion Densification

Wanyue Zhang, Lin Geng Foo, Thabo Beeler, Rishabh Dabral, Christian Theobalt

PDF

1 Repo

TL;DR

VHOI introduces a two-stage framework that densifies sparse human-object interaction trajectories into detailed masks and fine-tunes a diffusion model for realistic, controllable HOI video generation, incorporating novel motion representations.

Contribution

It presents a novel HOI-aware motion encoding and a two-stage densification and generation process for controllable HOI video synthesis.

Findings

01

Achieves state-of-the-art results in controllable HOI video generation.

02

Can generate full human navigation sequences leading to object interactions.

03

Demonstrates effectiveness of dense mask conditioning for realistic motion synthesis.

Abstract

Synthesizing realistic human-object interactions (HOI) in video is challenging due to the complex, instance-specific interaction dynamics of both humans and objects. Incorporating controllability in video generation further adds to the complexity. Existing controllable video generation approaches face a trade-off: sparse controls like keypoint trajectories are easy to specify but lack instance-awareness, while dense signals such as optical flow, depths or 3D meshes are informative but costly to obtain. We propose VHOI, a two-stage framework that first densifies sparse trajectories into HOI mask sequences, and then fine-tunes a video diffusion model conditioned on these dense masks. We introduce a novel HOI-aware motion representation that uses color encodings to distinguish not only human and object motion, but also body-part-specific dynamics. This design incorporates a human prior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://vcai.mpi-inf.mpg.de/projects/vhoi
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.