WOD-E2E: Waymo Open Dataset for End-to-End Driving in Challenging Long-tail Scenarios

Runsheng Xu; Hubert Lin; Wonseok Jeon; Hao Feng; Yuliang Zou; Liting Sun; John Gorman; Ekaterina Tolstaya; Sarah Tang; Brandyn White; Ben Sapp; Mingxing Tan; Jyh-Jing Hwang; Dragomir Anguelov

arXiv:2510.26125·cs.CV·November 14, 2025

WOD-E2E: Waymo Open Dataset for End-to-End Driving in Challenging Long-tail Scenarios

Runsheng Xu, Hubert Lin, Wonseok Jeon, Hao Feng, Yuliang Zou, Liting Sun, John Gorman, Ekaterina Tolstaya, Sarah Tang, Brandyn White, Ben Sapp, Mingxing Tan, Jyh-Jing Hwang, Dragomir Anguelov

PDF

TL;DR

This paper introduces WOD-E2E, a new challenging dataset for end-to-end autonomous driving in rare, long-tail scenarios, along with a novel evaluation metric to better assess system performance in complex real-world conditions.

Contribution

The authors present WOD-E2E, a dataset focused on rare, challenging driving scenarios, and propose Rater Feedback Score, a new open-loop metric for more accurate performance evaluation.

Findings

01

WOD-E2E contains 4,021 long-tail driving segments with less than 0.03% frequency.

02

Rater Feedback Score correlates better with human judgment than traditional metrics.

03

The dataset and metric facilitate research on robust autonomous driving in complex scenarios.

Abstract

Vision-based end-to-end (E2E) driving has garnered significant interest in the research community due to its scalability and synergy with multimodal large language models (MLLMs). However, current E2E driving benchmarks primarily feature nominal scenarios, failing to adequately test the true potential of these systems. Furthermore, existing open-loop evaluation metrics often fall short in capturing the multi-modal nature of driving or effectively evaluating performance in long-tail scenarios. To address these gaps, we introduce the Waymo Open Dataset for End-to-End Driving (WOD-E2E). WOD-E2E contains 4,021 driving segments (approximately 12 hours), specifically curated for challenging long-tail scenarios that that are rare in daily life with an occurring frequency of less than 0.03%. Concretely, each segment in WOD-E2E includes the high-level routing information, ego states, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.