TSTMotion: Training-free Scene-aware Text-to-motion Generation
Ziyan Guo, Haoxuan Qu, Hossein Rahmani, Dewen Soh, Ping Hu, and Qiuhong Ke, Jun Liu

TL;DR
TSTMotion is a novel training-free framework that enables pre-trained blank-background motion generators to produce scene-aware human motions based on 3D scenes and text descriptions, reducing reliance on large-scale ground-truth data.
Contribution
It introduces the first training-free scene-aware text-to-motion generation method that leverages foundation models to incorporate scene context into motion synthesis.
Findings
Effective scene-aware motion generation demonstrated through experiments.
High generalizability across diverse 3D scenes and text prompts.
Reduces need for expensive ground-truth motion datasets.
Abstract
Text-to-motion generation has recently garnered significant research interest, primarily focusing on generating human motion sequences in blank backgrounds. However, human motions commonly occur within diverse 3D scenes, which has prompted exploration into scene-aware text-to-motion generation methods. Yet, existing scene-aware methods often rely on large-scale ground-truth motion sequences in diverse 3D scenes, which poses practical challenges due to the expensive cost. To mitigate this challenge, we are the first to propose a \textbf{T}raining-free \textbf{S}cene-aware \textbf{T}ext-to-\textbf{Motion} framework, dubbed as \textbf{TSTMotion}, that efficiently empowers pre-trained blank-background motion generators with the scene-aware capability. Specifically, conditioned on the given 3D scene and text description, we adopt foundation models together to reason, predict and validate a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Video Analysis and Summarization · Multimodal Machine Learning Applications
MethodsADaptive gradient method with the OPTimal convergence rate
