FTMoMamba: Motion Generation with Frequency and Text State Space Models

Chengjian Li; Xiangbo Shu; Qiongjie Cui; Yazhou Yao; Jinhui Tang

arXiv:2411.17532·cs.CV·November 27, 2024

FTMoMamba: Motion Generation with Frequency and Text State Space Models

Chengjian Li, Xiangbo Shu, Qiongjie Cui, Yazhou Yao, Jinhui Tang

PDF

Open Access

TL;DR

FTMoMamba introduces a novel diffusion framework that leverages frequency and text state space models to improve human motion generation, capturing fine-grained motions and aligning text semantics with generated motions.

Contribution

The paper proposes FTMoMamba, a diffusion-based model with Frequency and Text State Space Models, to better capture motion details and semantic consistency in text-to-motion generation.

Findings

01

Achieves lowest FID of 0.181 on HumanML3D dataset.

02

Effectively decomposes motion into frequency components for detailed generation.

03

Aligns textual semantics with motion sequences for improved consistency.

Abstract

Diffusion models achieve impressive performance in human motion generation. However, current approaches typically ignore the significance of frequency-domain information in capturing fine-grained motions within the latent space (e.g., low frequencies correlate with static poses, and high frequencies align with fine-grained motions). Additionally, there is a semantic discrepancy between text and motion, leading to inconsistency between the generated motions and the text descriptions. In this work, we propose a novel diffusion-based FTMoMamba framework equipped with a Frequency State Space Model (FreqSSM) and a Text State Space Model (TextSSM). Specifically, to learn fine-grained representation, FreqSSM decomposes sequences into low-frequency and high-frequency components, guiding the generation of static pose (e.g., sits, lay) and fine-grained motions (e.g., transition, stumble),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis

MethodsALIGN