FrankenMotion: Part-level Human Motion Generation and Composition
Chuqiao Li, Xianghui Xie, Yong Cao, Andreas Geiger, Gerard Pons-Moll

TL;DR
FrankenMotion introduces a novel framework for fine-grained, part-level human motion generation guided by temporally-structured text prompts, enabled by a new high-quality dataset with atomic annotations, allowing enhanced controllability and motion composition.
Contribution
This work is the first to provide a dataset with atomic, temporally-aware part-level annotations and a diffusion-based model enabling detailed, controllable human motion generation from text.
Findings
Outperforms previous models in motion generation quality.
Enables composition of unseen motions during training.
Provides fine-grained control over individual body parts and actions.
Abstract
Human motion generation from text prompts has made remarkable progress in recent years. However, existing methods primarily rely on either sequence-level or action-level descriptions due to the absence of fine-grained, part-level motion annotations. This limits their controllability over individual body parts. In this work, we construct a high-quality motion dataset with atomic, temporally-aware part-level text annotations, leveraging the reasoning capabilities of large language models (LLMs). Unlike prior datasets that either provide synchronized part captions with fixed time segments or rely solely on global sequence labels, our dataset captures asynchronous and semantically distinct part movements at fine temporal resolution. Based on this dataset, we introduce a diffusion-based part-aware motion generation framework, namely FrankenMotion, where each body part is guided by its own…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
