AvatarGPT: All-in-One Framework for Motion Understanding, Planning,   Generation and Beyond

Zixiang Zhou; Yu Wan; Baoyuan Wang

arXiv:2311.16468·cs.CV·November 29, 2023·1 cites

AvatarGPT: All-in-One Framework for Motion Understanding, Planning, Generation and Beyond

Zixiang Zhou, Yu Wan, Baoyuan Wang

PDF

Open Access

TL;DR

AvatarGPT is a comprehensive framework that unifies multiple human motion understanding and generation tasks using a shared large language model, enabling seamless task integration and long-motion synthesis.

Contribution

It introduces an all-in-one LLM-based framework for human motion tasks, including encoding motion as tokens and enabling iterative long-motion synthesis.

Findings

01

Achieves state-of-the-art on low-level motion tasks

02

Demonstrates promising results on high-level tasks

03

Enables unlimited long-motion synthesis through task traversal

Abstract

Large Language Models(LLMs) have shown remarkable emergent abilities in unifying almost all (if not every) NLP tasks. In the human motion-related realm, however, researchers still develop siloed models for each task. Inspired by InstuctGPT, and the generalist concept behind Gato, we introduce AvatarGPT, an All-in-One framework for motion understanding, planning, generations as well as other tasks such as motion in-between synthesis. AvatarGPT treats each task as one type of instruction fine-tuned on the shared LLM. All the tasks are seamlessly interconnected with language as the universal interface, constituting a closed-loop within the framework. To achieve this, human motion sequences are first encoded as discrete tokens, which serve as the extended vocabulary of LLM. Then, an unsupervised pipeline to generate natural language descriptions of human action sequences from in-the-wild…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Human Motion and Animation