CASIM: Composite Aware Semantic Injection for Text to Motion Generation

Che-Jui Chang; Qingze Tony Liu; Honglu Zhou; Vladimir Pavlovic,; Mubbasir Kapadia

arXiv:2502.02063·cs.CV·February 5, 2025

CASIM: Composite Aware Semantic Injection for Text to Motion Generation

Che-Jui Chang, Qingze Tony Liu, Honglu Zhou, Vladimir Pavlovic,, Mubbasir Kapadia

PDF

Open Access

TL;DR

CASIM introduces a novel composite-aware semantic injection mechanism that enhances text-to-motion generation by improving semantic understanding, resulting in better motion quality, alignment, and controllability across various models.

Contribution

The paper presents CASIM, a model-agnostic, composite-aware semantic injection mechanism that significantly improves text-to-motion generation quality and controllability over existing fixed-length embedding methods.

Findings

01

CASIM improves motion quality and alignment scores on benchmarks.

02

CASIM enables more precise motion control from text prompts.

03

CASIM generalizes well to unseen text inputs.

Abstract

Recent advances in generative modeling and tokenization have driven significant progress in text-to-motion generation, leading to enhanced quality and realism in generated motions. However, effectively leveraging textual information for conditional motion generation remains an open challenge. We observe that current approaches, primarily relying on fixed-length text embeddings (e.g., CLIP) for global semantic injection, struggle to capture the composite nature of human motion, resulting in suboptimal motion quality and controllability. To address this limitation, we propose the Composite Aware Semantic Injection Mechanism (CASIM), comprising a composite-aware semantic encoder and a text-motion aligner that learns the dynamic correspondence between text and motion tokens. Notably, CASIM is model and representation-agnostic, readily integrating with both autoregressive and diffusion-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Handwritten Text Recognition Techniques · Human Pose and Action Recognition

MethodsAttentive Walk-Aggregating Graph Neural Network