SegMo: Segment-aligned Text to 3D Human Motion Generation
Bowen Dang, Lin Wu, Xiaohang Yang, Zheng Yuan, Zhixiang Chen

TL;DR
SegMo introduces a segment-aligned framework for text-to-3D human motion generation, enabling finer-grained semantic alignment between textual descriptions and motion sequences, improving accuracy and enabling retrieval applications.
Contribution
The paper proposes a novel segment-based alignment approach that decomposes both text and motion into semantic segments for improved fine-grained correspondence.
Findings
Achieves higher TOP 1 score of 0.553 on HumanML3D.
Improves baseline performance on two datasets.
Enables motion grounding and motion-to-text retrieval.
Abstract
Generating 3D human motions from textual descriptions is an important research problem with broad applications in video games, virtual reality, and augmented reality. Recent methods align the textual description with human motion at the sequence level, neglecting the internal semantic structure of modalities. However, both motion descriptions and motion sequences can be naturally decomposed into smaller and semantically coherent segments, which can serve as atomic alignment units to achieve finer-grained correspondence. Motivated by this, we propose SegMo, a novel Segment-aligned text-conditioned human Motion generation framework to achieve fine-grained text-motion alignment. Our framework consists of three modules: (1) Text Segment Extraction, which decomposes complex textual descriptions into temporally ordered phrases, each representing a simple atomic action; (2) Motion Segment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Multimodal Machine Learning Applications · Human Pose and Action Recognition
