CG-HOI: Contact-Guided 3D Human-Object Interaction Generation

Christian Diller; Angela Dai

arXiv:2311.16097·cs.CV·May 20, 2024·2 cites

CG-HOI: Contact-Guided 3D Human-Object Interaction Generation

Christian Diller, Angela Dai

PDF

Open Access

TL;DR

CG-HOI is a novel method that generates realistic 3D human-object interaction sequences from text by modeling contact as a key guidance factor, ensuring physical plausibility and coherence.

Contribution

It introduces the first approach to generate dynamic 3D HOIs from text using contact-guided joint diffusion modeling of human and object motions.

Findings

01

Produces realistic, physically plausible interaction sequences

02

Enables human motion generation conditioned on object trajectories without retraining

03

Applicable to static 3D scene scans

Abstract

We propose CG-HOI, the first method to address the task of generating dynamic 3D human-object interactions (HOIs) from text. We model the motion of both human and object in an interdependent fashion, as semantically rich human motion rarely happens in isolation without any interactions. Our key insight is that explicitly modeling contact between the human body surface and object geometry can be used as strong proxy guidance, both during training and inference. Using this guidance to bridge human and object motion enables generating more realistic and physically plausible interaction sequences, where the human body and corresponding object move in a coherent manner. Our method first learns to model human motion, object motion, and contact in a joint diffusion process, inter-correlated through cross-attention. We then leverage this learned contact for guidance during inference to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Multimodal Machine Learning Applications

MethodsDiffusion