CoMA: Compositional Human Motion Generation with Multi-modal Agents
Shanlin Sun, Gabriel De Araujo, Jiaqi Xu, Shenghan Zhou, Hanwen Zhang,, Ziheng Huang, Chenyu You, Xiaohui Xie

TL;DR
CoMA introduces an agent-based framework utilizing large language and vision models, along with a mask transformer generator, to enhance complex human motion generation, editing, and understanding, especially for detailed and unseen motions.
Contribution
The paper presents a novel multi-agent system with body-part-specific encoders and codebooks, enabling detailed, controllable, and high-quality human motion synthesis and editing.
Findings
Competitive performance on HumanML3D dataset
Significant improvement in long and detailed motion generation
User studies favor CoMA over existing methods
Abstract
3D human motion generation has seen substantial advancement in recent years. While state-of-the-art approaches have improved performance significantly, they still struggle with complex and detailed motions unseen in training data, largely due to the scarcity of motion datasets and the prohibitive cost of generating new training examples. To address these challenges, we introduce CoMA, an agent-based solution for complex human motion generation, editing, and comprehension. CoMA leverages multiple collaborative agents powered by large language and vision models, alongside a mask transformer-based motion generator featuring body part-specific encoders and codebooks for fine-grained control. Our framework enables generation of both short and long motion sequences with detailed instructions, text-guided motion editing, and self-correction for improved quality. Evaluations on the HumanML3D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Hand Gesture Recognition Systems
MethodsSparse Evolutionary Training
