UniHM: Universal Human Motion Generation with Object Interactions in Indoor Scenes
Zichen Geng, Zeeshan Hayder, Wei Liu, and Ajmal Mian

TL;DR
UniHM is a novel diffusion-based model that enables realistic, scene-aware human motion synthesis from text prompts, supporting complex object interactions in indoor environments.
Contribution
It introduces a unified framework supporting both Text-to-Motion and Text-to-Human-Object Interaction generation, with innovative motion representation and a new dataset enhancement.
Findings
Achieves competitive results on OMOMO benchmark for HOI synthesis.
Yields strong performance on HumanML3D for general motion generation.
Outperforms traditional VQ-VAEs in motion reconstruction and generation.
Abstract
Human motion synthesis in complex scenes presents a fundamental challenge, extending beyond conventional Text-to-Motion tasks by requiring the integration of diverse modalities such as static environments, movable objects, natural language prompts, and spatial waypoints. Existing language-conditioned motion models often struggle with scene-aware motion generation due to limitations in motion tokenization, which leads to information loss and fails to capture the continuous, context-dependent nature of 3D human movement. To address these issues, we propose UniHM, a unified motion language model that leverages diffusion-based generation for synthesizing scene-aware human motion. UniHM is the first framework to support both Text-to-Motion and Text-to-Human-Object Interaction (HOI) in complex 3D scenes. Our approach introduces three key contributions: (1) a mixed-motion representation that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications
