Motion-Agent: A Conversational Framework for Human Motion Generation   with LLMs

Qi Wu; Yubo Zhao; Yifan Wang; Xinhang Liu; Yu-Wing Tai; Chi-Keung Tang

arXiv:2405.17013·cs.CV·October 8, 2024·1 cites

Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs

Qi Wu, Yubo Zhao, Yifan Wang, Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang

PDF

Open Access 1 Repo

TL;DR

Motion-Agent introduces a conversational framework utilizing a pre-trained language model to generate, edit, and understand 3D human motions interactively, achieving high performance with minimal fine-tuning.

Contribution

The paper presents Motion-Agent, a novel framework that leverages a pre-trained language model for versatile human motion generation and editing without extensive training.

Findings

01

Performance comparable to diffusion models with only 1-3% fine-tuning

02

Enables complex motion generation through multi-turn conversations with GPT-4

03

Supports a wide range of motion-language tasks interactively

Abstract

While previous approaches to 3D human motion generation have achieved notable success, they often rely on extensive training and are limited to specific tasks. To address these challenges, we introduce Motion-Agent, an efficient conversational framework designed for general human motion generation, editing, and understanding. Motion-Agent employs an open-source pre-trained language model to develop a generative agent, MotionLLM, that bridges the gap between motion and text. This is accomplished by encoding and quantizing motions into discrete tokens that align with the language model's vocabulary. With only 1--3\% of the model's parameters fine-tuned using adapters, MotionLLM delivers performance on par with diffusion models and other transformer-based methods trained from scratch. By integrating MotionLLM with GPT-4 without additional training, Motion-Agent is able to generate highly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

szqwu/Motion-Agent
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Language, Metaphor, and Cognition · Subtitles and Audiovisual Media

MethodsAttention Is All You Need · Dense Connections · Adam · Linear Layer · Residual Connection · Position-Wise Feed-Forward Layer · Label Smoothing · Dropout · Byte Pair Encoding · Absolute Position Encodings