FG-MDM: Towards Zero-Shot Human Motion Generation via ChatGPT-Refined   Descriptions

Xu Shi; Wei Yao; Chuanchen Luo; Junran Peng; Hongwen Zhang; Yunlian; Sun

arXiv:2312.02772·cs.CV·December 6, 2024·1 cites

FG-MDM: Towards Zero-Shot Human Motion Generation via ChatGPT-Refined Descriptions

Xu Shi, Wei Yao, Chuanchen Luo, Junran Peng, Hongwen Zhang, Yunlian, Sun

PDF

Open Access

TL;DR

FG-MDM introduces a novel zero-shot human motion generation framework that leverages ChatGPT-refined descriptions and a part-token diffusion model to produce diverse motions beyond training data.

Contribution

The paper presents a divide-and-conquer approach using large language models for fine-grained descriptions and a transformer-based diffusion model for zero-shot motion generation, which is a new methodology.

Findings

01

FG-MDM outperforms previous methods in zero-shot settings.

02

It generates diverse human motions beyond original dataset scope.

03

Fine-grained textual annotations improve motion generation quality.

Abstract

Recently, significant progress has been made in text-based motion generation, enabling the generation of diverse and high-quality human motions that conform to textual descriptions. However, generating motions beyond the distribution of original datasets remains challenging, i.e., zero-shot generation. By adopting a divide-and-conquer strategy, we propose a new framework named Fine-Grained Human Motion Diffusion Model (FG-MDM) for zero-shot human motion generation. Specifically, we first parse previous vague textual annotations into fine-grained descriptions of different body parts by leveraging a large language model. We then use these fine-grained descriptions to guide a transformer-based diffusion model, which further adopts a design of part tokens. FG-MDM can generate human motions beyond the scope of original datasets owing to descriptions that are closer to motion essence. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Human Pose and Action Recognition · 3D Shape Modeling and Analysis

MethodsDiffusion