AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision   Reward

Haonan Han; Xiangzuo Wu; Huan Liao; Zunnan Xu; Zhongyuan Hu; Ronghui; Li; Yachao Zhang; Xiu Li

arXiv:2411.18654·cs.CV·December 2, 2024

AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward

Haonan Han, Xiangzuo Wu, Huan Liao, Zunnan Xu, Zhongyuan Hu, Ronghui, Li, Yachao Zhang, Xiu Li

PDF

Open Access

TL;DR

This paper introduces AToM, a framework that improves text-to-motion alignment at the event level by leveraging GPT-4Vision for detailed annotation and reinforcement learning, resulting in more accurate motion generation from textual prompts.

Contribution

The paper presents a novel approach combining GPT-4Vision-based annotation with reinforcement learning to enhance event-level text-to-motion alignment.

Findings

01

Significant improvement in event-level alignment quality.

02

Effective use of GPT-4Vision for detailed motion annotation.

03

Enhanced motion generation accuracy from textual prompts.

Abstract

Recently, text-to-motion models have opened new possibilities for creating realistic human motion with greater efficiency and flexibility. However, aligning motion generation with event-level textual descriptions presents unique challenges due to the complex relationship between textual prompts and desired motion outcomes. To address this, we introduce AToM, a framework that enhances the alignment between generated motion and text prompts by leveraging reward from GPT-4Vision. AToM comprises three main stages: Firstly, we construct a dataset MotionPrefer that pairs three types of event-level textual prompts with generated motions, which cover the integrity, temporal relationship and frequency of motion. Secondly, we design a paradigm that utilizes GPT-4Vision for detailed motion annotation, including visual data formatting, task-specific instructions and scoring rules for each sub-task.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Topic Modeling