Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion   Model

Yin Wang; Zhiying Leng; Frederick W. B. Li; Shun-Cheng Wu; Xiaohui; Liang

arXiv:2309.06284·cs.CV·September 13, 2023·2 cites

Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model

Yin Wang, Zhiying Leng, Frederick W. B. Li, Shun-Cheng Wu, Xiaohui, Liang

PDF

Open Access

TL;DR

This paper introduces Fg-T2M, a novel diffusion-based method for fine-grained, text-driven human motion generation that achieves high accuracy and control over motion sequences by leveraging linguistic structures and progressive reasoning.

Contribution

The paper presents a new approach combining linguistics-structure assistance and context-aware reasoning for precise text-driven human motion generation, outperforming existing methods.

Findings

01

Outperforms previous methods on HumanML3D and KIT datasets.

02

Generates motion sequences that better match detailed text descriptions.

03

Achieves higher visual and semantic consistency in generated motions.

Abstract

Text-driven human motion generation in computer vision is both significant and challenging. However, current methods are limited to producing either deterministic or imprecise motion sequences, failing to effectively control the temporal and spatial relationships required to conform to a given text description. In this work, we propose a fine-grained method for generating high-quality, conditional human motion sequences supporting precise text description. Our approach consists of two key components: 1) a linguistics-structure assisted module that constructs accurate and complete language feature to fully utilize text information; and 2) a context-aware progressive reasoning module that learns neighborhood and overall semantic linguistics features from shallow and deep graph neural networks to achieve a multi-step inference. Experiments show that our approach outperforms text-driven…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Multimodal Machine Learning Applications