LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset

Royden Wagner; Omer Sahin Tas; Jaime Villa; Felix Hauser; Yinzhe Shen; Marlon Steiner; Dominik Strutz; Carlos Fernandez; Christian Kinzig; Guillermo S. Guitierrez-Cabello; Hendrik K\"onigshof; Fabian Immel; Richard Schwarzkopf; Nils Alexander Rack; Kevin R\"osch; Kaiwen Wang; Jan-Hendrik Pauls; Martin Lauer; Igor Gilitschenski; Holger Caesar; Christoph Stiller

arXiv:2603.23607·cs.CV·April 7, 2026

LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset

Royden Wagner, Omer Sahin Tas, Jaime Villa, Felix Hauser, Yinzhe Shen, Marlon Steiner, Dominik Strutz, Carlos Fernandez, Christian Kinzig, Guillermo S. Guitierrez-Cabello, Hendrik K\"onigshof, Fabian Immel, Richard Schwarzkopf, Nils Alexander Rack, Kevin R\"osch, Kaiwen Wang

PDF

1 Repo 1 Models 1 Datasets

TL;DR

This paper introduces a new multimodal dataset for long-tail driving scenarios, emphasizing reasoning traces and multilingual instructions to improve generalization in self-driving models.

Contribution

The dataset provides multi-view videos, trajectories, instructions, and reasoning traces in multiple languages, enabling research on reasoning and generalization in autonomous driving.

Findings

01

Benchmark results show improved instruction following in multimodal models.

02

Multilingual reasoning traces enhance understanding across diverse cultural contexts.

03

Dataset facilitates studying the impact of reasoning on driving performance.

Abstract

In real-world domains such as self-driving, generalization to rare scenarios remains a fundamental challenge. To address this, we introduce a new dataset designed for end-to-end driving that focuses on long-tail driving events. We provide multi-view video data, trajectories, high-level instructions, and detailed reasoning traces, facilitating in-context learning and few-shot generalization. The resulting benchmark for multimodal models, such as VLMs and VLAs, goes beyond safety and comfort metrics by evaluating instruction following and semantic coherence between model outputs. The multilingual reasoning traces in English, Spanish, and Chinese are from domain experts with diverse cultural backgrounds. Thus, our dataset is a unique resource for studying how different forms of reasoning affect driving competence. Our dataset is available at:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://hf.co/datasets/kit-mrt/kitscenes-longtail
github

Models

🤗
ValeraZSD/kitscenes-longtail-solution
model

Datasets

KIT-MRT/KITScenes-LongTail
dataset· 395 dl
395 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.