Unimotion: Unifying 3D Human Motion Synthesis and Understanding
Chuqiao Li, Julian Chibane, Yannan He, Naama Pearl, Andreas Geiger,, Gerard Pons-moll

TL;DR
Unimotion is a novel unified model that integrates 3D human motion synthesis and understanding, enabling flexible control and frame-level text pairing, advancing multi-task capabilities in motion modeling.
Contribution
It introduces the first unified multi-task human motion model capable of both flexible motion control and frame-level motion understanding with paired text outputs.
Findings
Achieves state-of-the-art results on HumanML3D dataset.
Enables hierarchical control and motion editing via text.
Outputs frame-level text paired with generated poses.
Abstract
We introduce Unimotion, the first unified multi-task human motion model capable of both flexible motion control and frame-level motion understanding. While existing works control avatar motion with global text conditioning, or with fine-grained per frame scripts, none can do both at once. In addition, none of the existing works can output frame-level text paired with the generated poses. In contrast, Unimotion allows to control motion with global text, or local frame-level text, or both at once, providing more flexible control for users. Importantly, Unimotion is the first model which by design outputs local text paired with the generated poses, allowing users to know what motion happens and when, which is necessary for a wide range of applications. We show Unimotion opens up new applications: 1.) Hierarchical control, allowing users to specify motion at different levels of detail, 2.)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation
