Action-GPT: Leveraging Large-scale Language Models for Improved and   Generalized Action Generation

Sai Shashank Kalakonda; Shubh Maheshwari; Ravi Kiran Sarvadevabhatla

arXiv:2211.15603·cs.CV·March 8, 2023

Action-GPT: Leveraging Large-scale Language Models for Improved and Generalized Action Generation

Sai Shashank Kalakonda, Shubh Maheshwari, Ravi Kiran Sarvadevabhatla

PDF

Open Access

TL;DR

Action-GPT leverages large language models to generate detailed action descriptions, improving text-to-motion alignment and synthesis quality in motion generation models, with zero-shot capabilities and multi-description utilization.

Contribution

The paper introduces a versatile framework that enhances text-based action generation by integrating LLMs for richer descriptions, applicable to various models and enabling zero-shot motion synthesis.

Findings

01

Improved qualitative and quantitative motion synthesis quality.

02

Effective use of multiple LLM-generated descriptions.

03

Demonstrated zero-shot generation capabilities.

Abstract

We introduce Action-GPT, a plug-and-play framework for incorporating Large Language Models (LLMs) into text-based action generation models. Action phrases in current motion capture datasets contain minimal and to-the-point information. By carefully crafting prompts for LLMs, we generate richer and fine-grained descriptions of the action. We show that utilizing these detailed descriptions instead of the original action phrases leads to better alignment of text and motion spaces. We introduce a generic approach compatible with stochastic (e.g. VAE-based) and deterministic (e.g. MotionCLIP) text-to-motion models. In addition, the approach enables multiple text descriptions to be utilized. Our experiments show (i) noticeable qualitative and quantitative improvement in the quality of synthesized motions, (ii) benefits of utilizing multiple LLM-generated descriptions, (iii) suitability of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Human Pose and Action Recognition · Video Analysis and Summarization