FlowCoMotion: Text-to-Motion Generation via Token-Latent Flow Modeling
Dawei Guan, Di Yang, Chengjie Jin, Jiangtao Wang

TL;DR
FlowCoMotion introduces a unified token-latent flow modeling framework for text-to-motion generation, effectively capturing semantic content and motion details through combined continuous and discrete representations.
Contribution
It proposes a novel token-latent coupling approach that unifies continuous and discrete motion representations for improved text-to-motion synthesis.
Findings
Achieves competitive results on HumanML3D and SnapMoGen benchmarks.
Effectively captures both semantic and fine-grained motion details.
Outperforms existing methods in text-to-motion generation tasks.
Abstract
Text-to-motion generation is driven by learning motion representations for semantic alignment with language. Existing methods rely on either continuous or discrete motion representations. However, continuous representations entangle semantics with dynamics, while discrete representations lose fine-grained motion details. In this context, we propose FlowCoMotion, a novel motion generation framework that unifies both treatments from a modeling perspective. Specifically, FlowCoMotion employs token-latent coupling to capture both semantic content and high-fidelity motion details. In the latent branch, we apply multi-view distillation to regularize the continuous latent space, while in the token branch we use discrete temporal resolution quantization to extract high-level semantic cues. The motion latent is then obtained by combining the representations from the two branches through a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
