TSTMotion: Training-free Scene-aware Text-to-motion Generation

Ziyan Guo; Haoxuan Qu; Hossein Rahmani; Dewen Soh; Ping Hu; and Qiuhong Ke; Jun Liu

arXiv:2505.01182·cs.CV·May 6, 2025

TSTMotion: Training-free Scene-aware Text-to-motion Generation

Ziyan Guo, Haoxuan Qu, Hossein Rahmani, Dewen Soh, Ping Hu, and Qiuhong Ke, Jun Liu

PDF

Open Access

TL;DR

TSTMotion is a novel training-free framework that enables pre-trained blank-background motion generators to produce scene-aware human motions based on 3D scenes and text descriptions, reducing reliance on large-scale ground-truth data.

Contribution

It introduces the first training-free scene-aware text-to-motion generation method that leverages foundation models to incorporate scene context into motion synthesis.

Findings

01

Effective scene-aware motion generation demonstrated through experiments.

02

High generalizability across diverse 3D scenes and text prompts.

03

Reduces need for expensive ground-truth motion datasets.

Abstract

Text-to-motion generation has recently garnered significant research interest, primarily focusing on generating human motion sequences in blank backgrounds. However, human motions commonly occur within diverse 3D scenes, which has prompted exploration into scene-aware text-to-motion generation methods. Yet, existing scene-aware methods often rely on large-scale ground-truth motion sequences in diverse 3D scenes, which poses practical challenges due to the expensive cost. To mitigate this challenge, we are the first to propose a \textbf{T}raining-free \textbf{S}cene-aware \textbf{T}ext-to-\textbf{Motion} framework, dubbed as \textbf{TSTMotion}, that efficiently empowers pre-trained blank-background motion generators with the scene-aware capability. Specifically, conditioned on the given 3D scene and text description, we adopt foundation models together to reason, predict and validate a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Video Analysis and Summarization · Multimodal Machine Learning Applications

MethodsADaptive gradient method with the OPTimal convergence rate