Light-T2M: A Lightweight and Fast Model for Text-to-motion Generation
Ling-An Zeng, Guohong Huang, Gaojie Wu, Wei-Shi Zheng

TL;DR
Light-T2M is a novel lightweight and fast text-to-motion generation model that reduces parameters and inference time while improving motion quality by emphasizing local information and innovative textual integration.
Contribution
The paper introduces a lightweight model with a Local Information Modeling Module, Mamba, a Pseudo-bidirectional Scan, and an Adaptive Textual Information Injector for efficient T2M generation.
Findings
Parameters reduced to 10% of state-of-the-art
Inference time decreased by 16%
Achieved better FID scores on benchmark datasets
Abstract
Despite the significant role text-to-motion (T2M) generation plays across various applications, current methods involve a large number of parameters and suffer from slow inference speeds, leading to high usage costs. To address this, we aim to design a lightweight model to reduce usage costs. First, unlike existing works that focus solely on global information modeling, we recognize the importance of local information modeling in the T2M task by reconsidering the intrinsic properties of human motion, leading us to propose a lightweight Local Information Modeling Module. Second, we introduce Mamba to the T2M task, reducing the number of parameters and GPU memory demands, and we have designed a novel Pseudo-bidirectional Scan to replicate the effects of a bidirectional scan without increasing parameter count. Moreover, we propose a novel Adaptive Textual Information Injector that more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Human Motion and Animation
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces · Focus
