InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction
Sirui Xu, Ziyin Wang, Yu-Xiong Wang, Liang-Yan Gui

TL;DR
InterDreamer is a novel framework that generates realistic 3D human-object interaction sequences from text without requiring interaction-specific training data, by decoupling semantics and dynamics and leveraging large pre-trained models.
Contribution
The paper introduces a zero-shot method for 3D human-object interaction generation that combines large language and motion models with a physics-based world model.
Findings
Successfully generates realistic 3D HOI sequences aligned with text
Operates without direct training on interaction data
Validated on BEHAVE and CHAIRS datasets
Abstract
Text-conditioned human motion generation has experienced significant advancements with diffusion models trained on extensive motion capture data and corresponding textual annotations. However, extending such success to 3D dynamic human-object interaction (HOI) generation faces notable challenges, primarily due to the lack of large-scale interaction data and comprehensive descriptions that align with these interactions. This paper takes the initiative and showcases the potential of generating human-object interactions without direct training on text-interaction pair data. Our key insight in achieving this is that interaction semantics and dynamics can be decoupled. Being unable to learn interaction semantics through supervised training, we instead leverage pre-trained large models, synergizing knowledge from a large language model and a text-to-motion model. While such knowledge offers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Handwritten Text Recognition Techniques
MethodsDiffusion · ALIGN
