Free-T2M: Robust Text-to-Motion Generation for Humanoid Robots via Frequency-Domain
Wenshuo Chen, Haozhe Jia, Songning Lai, Lei Wang, Yuqi Lin, Hongru Xiao, Lijie Hu, Yutao Yue

TL;DR
This paper introduces Free-T2M, a frequency-domain approach for text-to-motion generation in humanoid robots, significantly improving motion quality and robustness by separating semantic planning and detailed execution phases.
Contribution
It proposes a novel frequency-domain framework with stage-specific consistency alignment and a temporal-adaptive module, enhancing the stability and accuracy of robot motion synthesis from language.
Findings
Reduces FID from 0.152 to 0.060 on StableMoFusion baseline.
Improves semantic correctness and motion robustness.
Establishes a new state-of-the-art in diffusion-based T2M tasks.
Abstract
Enabling humanoid robots to synthesize complex, physically coherent motions from natural language commands is a cornerstone of autonomous robotics and human-robot interaction. While diffusion models have shown promise in this text-to-motion (T2M) task, they often generate semantically flawed or unstable motions, limiting their applicability to real-world robots. This paper reframes the T2M problem from a frequency-domain perspective, revealing that the generative process mirrors a hierarchical control paradigm. We identify two critical phases: a semantic planning stage, where low-frequency components establish the global motion trajectory, and a fine-grained execution stage, where high-frequency details refine the movement. To address the distinct challenges of each phase, we introduce Frequency enhanced text-to-motion (Free-T2M), a framework incorporating stage-specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
MethodsDiffusion · Focus
