Context-based Motion Retrieval using Open Vocabulary Methods for Autonomous Driving

Stefan Englmeier; Max A. B\"uttner; Katharina Winter; and Fabian B. Flohr

arXiv:2508.00589·cs.CV·August 13, 2025

Context-based Motion Retrieval using Open Vocabulary Methods for Autonomous Driving

Stefan Englmeier, Max A. B\"uttner, Katharina Winter, and Fabian B. Flohr

PDF

TL;DR

This paper introduces a novel context-aware motion retrieval framework for autonomous driving, combining multimodal embeddings and a new dataset, WayMoCo, to improve retrieval accuracy of complex human behaviors in driving scenarios.

Contribution

The paper presents a new multimodal motion retrieval method using SMPL-based sequences and introduces the WayMoCo dataset for evaluating such retrieval in autonomous driving.

Findings

01

Outperforms state-of-the-art models by up to 27.5% accuracy in retrieval tasks.

02

Enables scalable retrieval of human behavior and context via natural language queries.

03

Provides a new dataset, WayMoCo, for evaluating motion-context retrieval in autonomous driving.

Abstract

Autonomous driving systems must operate reliably in safety-critical scenarios, particularly those involving unusual or complex behavior by Vulnerable Road Users (VRUs). Identifying these edge cases in driving datasets is essential for robust evaluation and generalization, but retrieving such rare human behavior scenarios within the long tail of large-scale datasets is challenging. To support targeted evaluation of autonomous driving systems in diverse, human-centered scenarios, we propose a novel context-aware motion retrieval framework. Our method combines Skinned Multi-Person Linear (SMPL)-based motion sequences and corresponding video frames before encoding them into a shared multimodal embedding space aligned with natural language. Our approach enables the scalable retrieval of human behavior and their context through text queries. This work also introduces our dataset WayMoCo, an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.