Controllable Human-Object Interaction Synthesis

Jiaman Li; Alexander Clegg; Roozbeh Mottaghi; Jiajun Wu; Xavier Puig,; C. Karen Liu

arXiv:2312.03913·cs.CV·July 16, 2024·2 cites

Controllable Human-Object Interaction Synthesis

Jiaman Li, Alexander Clegg, Roozbeh Mottaghi, Jiajun Wu, Xavier Puig,, C. Karen Liu

PDF

Open Access

TL;DR

This paper introduces CHOIS, a diffusion-based method for synthesizing realistic, controllable human-object interactions in 3D scenes guided by language descriptions and sparse waypoints, with applications in long-term simulation.

Contribution

The paper presents a novel diffusion model approach with object geometry loss and contact guidance for realistic, controllable human-object interaction synthesis from language and sparse cues.

Findings

01

Successfully generates realistic human-object interactions aligned with descriptions.

02

Effectively incorporates waypoints and contact constraints during synthesis.

03

Enables long-term interaction planning in 3D environments.

Abstract

Synthesizing semantic-aware, long-horizon, human-object interaction is critical to simulate realistic human behaviors. In this work, we address the challenging problem of generating synchronized object motion and human motion guided by language descriptions in 3D scenes. We propose Controllable Human-Object Interaction Synthesis (CHOIS), an approach that generates object motion and human motion simultaneously using a conditional diffusion model given a language description, initial object and human states, and sparse object waypoints. Here, language descriptions inform style and intent, and waypoints, which can be effectively extracted from high-level planning, ground the motion in the scene. Naively applying a diffusion model fails to predict object motion aligned with the input waypoints; it also cannot ensure the realism of interactions that require precise hand-object and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Multimodal Machine Learning Applications

MethodsDiffusion