VersatileMotion: A Unified Framework for Motion Synthesis and Comprehension

Zeyu Ling; Bo Han; Shiyang Li; Jikang Cheng; Hongdeng Shen; Changqing Zou

arXiv:2411.17335·cs.CV·May 27, 2025

VersatileMotion: A Unified Framework for Motion Synthesis and Comprehension

Zeyu Ling, Bo Han, Shiyang Li, Jikang Cheng, Hongdeng Shen, Changqing Zou

PDF

Open Access 1 Repo

TL;DR

VersatileMotion is a comprehensive multimodal motion model that unifies multiple tasks, supports single and multi-agent motions, and enables cross-modal translation, achieving state-of-the-art results across diverse motion-related applications.

Contribution

It introduces a novel motion tokenizer combining VQ-VAE and flow matching, and a unified framework supporting nine motion tasks, including cross-modal translation and multi-agent motion understanding.

Findings

01

Achieves state-of-the-art performance on seven tasks.

02

Supports cross-modal translation between motion, text, music, and speech.

03

Handles both single-agent and multi-agent motions in a unified framework.

Abstract

Large language models (LLMs) are, by design, inherently capable of multi-task learning: through a unified next-token prediction paradigm, they can naturally address a wide variety of downstream tasks. Prior work in the motion domain has demonstrated some generality by adapting LLMs via a Motion Tokenizer coupled with an autoregressive Transformer to generate and understand human motion. However, this generality remains limited in scope and yields only modest performance gains. We introduce VersatileMotion, a unified multimodal motion LLM that combines a novel motion tokenizer, integrating VQ-VAE with flow matching, and an autoregressive transformer backbone to seamlessly support at least nine distinct motion-related tasks. VersatileMotion is the first method to handle single-agent and multi-agent motions in a single framework and enable cross-modal conversion between motion, text,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZeyuLing/MotionLLaMA
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Human Pose and Action Recognition · Spatial Cognition and Navigation