Marrying Text-to-Motion Generation with Skeleton-Based Action Recognition

Jidong Kuang; Hongsong Wang; Jie Gui

arXiv:2604.17090·cs.CV·April 21, 2026

Marrying Text-to-Motion Generation with Skeleton-Based Action Recognition

Jidong Kuang, Hongsong Wang, Jie Gui

PDF

1 Repo

TL;DR

This paper introduces CoAMD, a unified model that links human action recognition and motion generation using skeleton data, achieving state-of-the-art results across multiple tasks.

Contribution

The work presents a novel unified framework with a multi-modal recognizer and diffusion-based motion synthesis, bridging the gap between understanding and generating human motion from text and skeleton data.

Findings

01

Achieves state-of-the-art performance on 13 benchmarks.

02

Effectively handles four tasks: recognition, generation, retrieval, and editing.

03

Demonstrates the versatility of skeleton-based motion modeling.

Abstract

Human action recognition and motion generation are two active research problems in human-centric computer vision, both aiming to align motion with textual semantics. However, most existing works study these two problems separately, without uncovering the links between them, namely that motion generation requires semantic comprehension. This work investigates unified action recognition and motion generation by leveraging skeleton coordinates for both motion understanding and generation. We propose Coordinates-based Autoregressive Motion Diffusion (CoAMD), which synthesizes motion in a coarse-to-fine manner. As a core component of CoAMD, we design a Multi-modal Action Recognizer (MAR) that provides gradient-based semantic guidance for motion generation. Furthermore, we establish a rigorous benchmark by evaluating baselines on absolute coordinates. Our model can be applied to four…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jidongkuang/CoAMD
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.