InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction

Sirui Xu; Ziyin Wang; Yu-Xiong Wang; Liang-Yan Gui

arXiv:2403.19652·cs.CV·February 3, 2026·2 cites

InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction

Sirui Xu, Ziyin Wang, Yu-Xiong Wang, Liang-Yan Gui

PDF

Open Access 1 Video

TL;DR

InterDreamer is a novel framework that generates realistic 3D human-object interaction sequences from text without requiring interaction-specific training data, by decoupling semantics and dynamics and leveraging large pre-trained models.

Contribution

The paper introduces a zero-shot method for 3D human-object interaction generation that combines large language and motion models with a physics-based world model.

Findings

01

Successfully generates realistic 3D HOI sequences aligned with text

02

Operates without direct training on interaction data

03

Validated on BEHAVE and CHAIRS datasets

Abstract

Text-conditioned human motion generation has experienced significant advancements with diffusion models trained on extensive motion capture data and corresponding textual annotations. However, extending such success to 3D dynamic human-object interaction (HOI) generation faces notable challenges, primarily due to the lack of large-scale interaction data and comprehensive descriptions that align with these interactions. This paper takes the initiative and showcases the potential of generating human-object interactions without direct training on text-interaction pair data. Our key insight in achieving this is that interaction semantics and dynamics can be decoupled. Being unable to learn interaction semantics through supervised training, we instead leverage pre-trained large models, synergizing knowledge from a large language model and a text-to-motion model. While such knowledge offers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction· slideslive

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Handwritten Text Recognition Techniques

MethodsDiffusion · ALIGN