ANT: Adaptive Neural Temporal-Aware Text-to-Motion Model
Wenshuo Chen, Kuimou Yu, Haozhe Jia, Kaishen Yuan, Zexu Huang, Bowen Tian, Songning Lai, Hongru Xiao, Erhang Zhang, Lei Wang, and Yutao Yue

TL;DR
ANT introduces an adaptive neural architecture for text-to-motion generation that dynamically adjusts semantic granularity and guidance to improve temporal coherence and semantic alignment, inspired by biological morphogenesis.
Contribution
The paper proposes ANT, a novel adaptive neural model with spectral analysis-based semantic partitioning and dynamic guidance scheduling for improved text-to-motion synthesis.
Findings
Achieves state-of-the-art semantic alignment on StableMoFusion.
Significantly improves performance across various baseline models.
Demonstrates effective temporal and semantic control in motion generation.
Abstract
While diffusion models advance text-to-motion generation, their static semantic conditioning ignores temporal-frequency demands: early denoising requires structural semantics for motion foundations while later stages need localized details for text alignment. This mismatch mirrors biological morphogenesis where developmental phases demand distinct genetic programs. Inspired by epigenetic regulation governing morphological specialization, we propose **(ANT)**, an **A**daptive **N**eural **T**emporal-Aware architecture. ANT orchestrates semantic granularity through: **(i) Semantic Temporally Adaptive (STA) Module:** Automatically partitions denoising into low-frequency structural planning and high-frequency refinement via spectral analysis. **(ii) Dynamic Classifier-Free Guidance scheduling (DCFG):** Adaptively adjusts conditional to unconditional ratio enhancing efficiency while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
