KETA: Kinematic-Phrases-Enhanced Text-to-Motion Generation via   Fine-grained Alignment

Yu Jiang; Yixing Chen; Xingyang Li

arXiv:2501.15058·cs.CV·January 28, 2025

KETA: Kinematic-Phrases-Enhanced Text-to-Motion Generation via Fine-grained Alignment

Yu Jiang, Yixing Chen, Xingyang Li

PDF

Open Access 1 Repo

TL;DR

KETA introduces a novel text-to-motion generation method that uses kinematic phrases as an intermediate representation, improving alignment and motion accuracy through fine-grained supervision and iterative refinement.

Contribution

This work presents KETA, a new approach that decomposes text into kinematic phrases and aligns them with motion segments, enhancing the quality and consistency of generated motions.

Findings

01

KETA achieves up to 1.19x better R precision.

02

KETA reduces FID values by up to 2.34x.

03

It outperforms existing T2M models in accuracy and quality.

Abstract

Motion synthesis plays a vital role in various fields of artificial intelligence. Among the various conditions of motion generation, text can describe motion details elaborately and is easy to acquire, making text-to-motion(T2M) generation important. State-of-the-art T2M techniques mainly leverage diffusion models to generate motions with text prompts as guidance, tackling the many-to-many nature of T2M tasks. However, existing T2M approaches face challenges, given the gap between the natural language domain and the physical domain, making it difficult to generate motions fully consistent with the texts. We leverage kinematic phrases(KP), an intermediate representation that bridges these two modalities, to solve this. Our proposed method, KETA, decomposes the given text into several decomposed texts via a language model. It trains an aligner to align decomposed texts with the KP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PolarisDane/KETA
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Handwritten Text Recognition Techniques

MethodsDiffusion · ALIGN · Balanced Selection · Kollen-Pollack Learning