Generating Human Motion in 3D Scenes from Text Descriptions

Zhi Cen; Huaijin Pi; Sida Peng; Zehong Shen; Minghui Yang; Shuai Zhu,; Hujun Bao; Xiaowei Zhou

arXiv:2405.07784·cs.CV·May 14, 2024

Generating Human Motion in 3D Scenes from Text Descriptions

Zhi Cen, Huaijin Pi, Sida Peng, Zehong Shen, Minghui Yang, Shuai Zhu,, Hujun Bao, Xiaowei Zhou

PDF

Open Access

TL;DR

This paper introduces a method for generating realistic 3D human motions within indoor scenes based on text descriptions, emphasizing human-scene interactions and spatial reasoning.

Contribution

It proposes a novel two-step approach combining language grounding with large language models and object-centric motion generation for improved realism.

Findings

01

Outperforms baseline methods in motion quality

02

Effective integration of scene context and text descriptions

03

Validates the importance of spatial reasoning in motion generation

Abstract

Generating human motions from textual descriptions has gained growing research interest due to its wide range of applications. However, only a few works consider human-scene interactions together with text conditions, which is crucial for visual and physical realism. This paper focuses on the task of generating human motions in 3D indoor scenes given text descriptions of the human-scene interactions. This task presents challenges due to the multi-modality nature of text, scene, and motion, as well as the need for spatial reasoning. To address these challenges, we propose a new approach that decomposes the complex problem into two more manageable sub-problems: (1) language grounding of the target object and (2) object-centric motion generation. For language grounding of the target object, we leverage the power of large language models. For motion generation, we design an object-centric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Multimodal Machine Learning Applications

MethodsFocus