AttT2M: Text-Driven Human Motion Generation with Multi-Perspective   Attention Mechanism

Chongyang Zhong; Lei Hu; Zihao Zhang; Shihong Xia

arXiv:2309.00796·cs.CV·September 6, 2023·1 cites

AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism

Chongyang Zhong, Lei Hu, Zihao Zhang, Shihong Xia

PDF

Open Access 1 Repo

TL;DR

AttT2M introduces a two-stage, multi-perspective attention approach for generating diverse, natural 3D human motions from text, outperforming existing methods in quality and detail.

Contribution

The paper proposes a novel two-stage framework with body-part and cross-modal attention mechanisms for improved text-driven human motion synthesis.

Findings

01

Outperforms state-of-the-art on HumanML3D and KIT-ML datasets

02

Achieves fine-grained, diverse motion generation

03

Demonstrates superior qualitative and quantitative results

Abstract

Generating 3D human motion based on textual descriptions has been a research focus in recent years. It requires the generated motion to be diverse, natural, and conform to the textual description. Due to the complex spatio-temporal nature of human motion and the difficulty in learning the cross-modal relationship between text and motion, text-driven motion generation is still a challenging problem. To address these issues, we propose \textbf{AttT2M}, a two-stage method with multi-perspective attention mechanism: \textbf{body-part attention} and \textbf{global-local motion-text attention}. The former focuses on the motion embedding perspective, which means introducing a body-part spatio-temporal encoder into VQ-VAE to learn a more expressive discrete latent space. The latter is from the cross-modal perspective, which is used to learn the sentence-level and word-level motion-text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zcymonkey/attt2m
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Multimodal Machine Learning Applications

MethodsVQ-VAE · Focus