Chronologically Accurate Retrieval for Temporal Grounding of   Motion-Language Models

Kent Fujiwara; Mikihiro Tanaka; Qing Yu

arXiv:2407.15408·cs.CV·July 23, 2024

Chronologically Accurate Retrieval for Temporal Grounding of Motion-Language Models

Kent Fujiwara, Mikihiro Tanaka, Qing Yu

PDF

Open Access 1 Models

TL;DR

This paper highlights the importance of temporal accuracy in motion-language models, introduces the CAR evaluation to identify chronological misalignments, and proposes training with shuffled event sequences to improve temporal understanding.

Contribution

It introduces the Chronologically Accurate Retrieval (CAR) task and training method, emphasizing the need for temporal alignment in motion-language models, which was previously overlooked.

Findings

01

CAR reveals many models fail in event chronology understanding.

02

Training with shuffled event sequences improves temporal alignment.

03

Enhanced models show better performance in text-motion retrieval and generation.

Abstract

With the release of large-scale motion datasets with textual annotations, the task of establishing a robust latent space for language and 3D human motion has recently witnessed a surge of interest. Methods have been proposed to convert human motion and texts into features to achieve accurate correspondence between them. Despite these efforts to align language and motion representations, we claim that the temporal element is often overlooked, especially for compound actions, resulting in chronological inaccuracies. To shed light on the temporal alignment in motion-language latent spaces, we propose Chronologically Accurate Retrieval (CAR) to evaluate the chronological understanding of the models. We decompose textual descriptions into events, and prepare negative text samples by shuffling the order of events in compound action descriptions. We then design a simple task for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
line-corporation/ChronAccRet
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · Human Motion and Animation

MethodsALIGN