Controllable Human-Object Interaction Synthesis
Jiaman Li, Alexander Clegg, Roozbeh Mottaghi, Jiajun Wu, Xavier Puig,, C. Karen Liu

TL;DR
This paper introduces CHOIS, a diffusion-based method for synthesizing realistic, controllable human-object interactions in 3D scenes guided by language descriptions and sparse waypoints, with applications in long-term simulation.
Contribution
The paper presents a novel diffusion model approach with object geometry loss and contact guidance for realistic, controllable human-object interaction synthesis from language and sparse cues.
Findings
Successfully generates realistic human-object interactions aligned with descriptions.
Effectively incorporates waypoints and contact constraints during synthesis.
Enables long-term interaction planning in 3D environments.
Abstract
Synthesizing semantic-aware, long-horizon, human-object interaction is critical to simulate realistic human behaviors. In this work, we address the challenging problem of generating synchronized object motion and human motion guided by language descriptions in 3D scenes. We propose Controllable Human-Object Interaction Synthesis (CHOIS), an approach that generates object motion and human motion simultaneously using a conditional diffusion model given a language description, initial object and human states, and sparse object waypoints. Here, language descriptions inform style and intent, and waypoints, which can be effectively extracted from high-level planning, ground the motion in the scene. Naively applying a diffusion model fails to predict object motion aligned with the input waypoints; it also cannot ensure the realism of interactions that require precise hand-object and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Multimodal Machine Learning Applications
MethodsDiffusion
