Human Motion Synthesis in 3D Scenes via Unified Scene Semantic Occupancy

Gong Jingyu; Tong Kunkun; Chen Zhuoran; Yuan Chuanhan; Chen Mingang; Zhang Zhizhong; Tan Xin; Xie Yuan

arXiv:2511.07819·cs.CV·November 12, 2025

Human Motion Synthesis in 3D Scenes via Unified Scene Semantic Occupancy

Gong Jingyu, Tong Kunkun, Chen Zhuoran, Yuan Chuanhan, Chen Mingang, Zhang Zhizhong, Tan Xin, Xie Yuan

PDF

Open Access

TL;DR

This paper introduces SSOMotion, a novel human motion synthesis framework that uses a unified scene semantic occupancy representation, combining scene semantics and structure for improved motion control in complex 3D environments.

Contribution

The paper proposes a new unified scene semantic occupancy representation and a bi-directional tri-plane decomposition for efficient, fine-grained scene understanding in human motion synthesis.

Findings

01

Outperforms existing methods in cluttered scenes

02

Demonstrates strong generalization across datasets

03

Validates effectiveness through extensive ablation studies

Abstract

Human motion synthesis in 3D scenes relies heavily on scene comprehension, while current methods focus mainly on scene structure but ignore the semantic understanding. In this paper, we propose a human motion synthesis framework that take an unified Scene Semantic Occupancy (SSO) for scene representation, termed SSOMotion. We design a bi-directional tri-plane decomposition to derive a compact version of the SSO, and scene semantics are mapped to an unified feature space via CLIP encoding and shared linear dimensionality reduction. Such strategy can derive the fine-grained scene semantic structures while significantly reduce redundant computations. We further take these scene hints and movement direction derived from instructions for motion control via frame-wise scene query. Extensive experiments and ablation studies conducted on cluttered scenes using ShapeNet furniture, as well as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · 3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis