Object-Aware 4D Human Motion Generation

Shurui Gui; Deep Anil Patel; Xiner Li; Martin Renqiang Min

arXiv:2511.00248·cs.CV·November 4, 2025

Object-Aware 4D Human Motion Generation

Shurui Gui, Deep Anil Patel, Xiner Li, Martin Renqiang Min

PDF

Open Access

TL;DR

This paper introduces a zero-shot, object-aware 4D human motion generation framework that leverages 3D priors, large language models, and motion diffusion models to produce realistic, physically plausible human motions in complex scenes.

Contribution

It proposes a novel zero-shot method combining 3D Gaussian representations, LLMs, and motion diffusion priors for object-aware human motion generation without retraining.

Findings

01

Produces natural, physically plausible motions respecting 3D spatial context

02

Generalizes to out-of-distribution object interactions without retraining

03

Outperforms prior methods in realism and physical consistency

Abstract

Recent advances in video diffusion models have enabled the generation of high-quality videos. However, these videos still suffer from unrealistic deformations, semantic violations, and physical inconsistencies that are largely rooted in the absence of 3D physical priors. To address these challenges, we propose an object-aware 4D human motion generation framework grounded in 3D Gaussian representations and motion diffusion priors. With pre-generated 3D humans and objects, our method, Motion Score Distilled Interaction (MSDI), employs the spatial and prompt semantic information in large language models (LLMs) and motion priors through the proposed Motion Diffusion Score Distillation Sampling (MSDS). The combination of MSDS and LLMs enables our spatial-aware motion optimization, which distills score gradients from pre-trained motion diffusion models, to refine human motion while respecting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · Human Pose and Action Recognition