Fail2Drive: Benchmarking Closed-Loop Driving Generalization

Simon Gerstenecker; Andreas Geiger; Katrin Renz

arXiv:2604.08535·cs.RO·April 10, 2026

Fail2Drive: Benchmarking Closed-Loop Driving Generalization

Simon Gerstenecker, Andreas Geiger, Katrin Renz

PDF

1 Repo

TL;DR

Fail2Drive introduces a novel benchmark for evaluating closed-loop driving generalization in CARLA, highlighting significant model failures and providing tools for scenario creation and validation.

Contribution

It presents the first paired-route benchmark with a comprehensive suite of scenarios and an open-source toolbox to facilitate reproducible research on driving generalization.

Findings

01

Models show an average success-rate drop of 22.8% under distribution shifts.

02

Uncovered failure modes include ignoring LiDAR-visible objects and misunderstanding free space.

03

Benchmark and tools enable systematic evaluation and improvement of autonomous driving models.

Abstract

Generalization under distribution shift remains a central bottleneck for closed-loop autonomous driving. Although simulators like CARLA enable safe and scalable testing, existing benchmarks rarely measure true generalization: they typically reuse training scenarios at test time. Success can therefore reflect memorization rather than robust driving behavior. We introduce Fail2Drive, the first paired-route benchmark for closed-loop generalization in CARLA, with 200 routes and 17 new scenario classes spanning appearance, layout, behavioral, and robustness shifts. Each shifted route is matched with an in-distribution counterpart, isolating the effect of the shift and turning qualitative failures into quantitative diagnostics. Evaluating multiple state-of-the-art models reveals consistent degradation, with an average success-rate drop of 22.8\%. Our analysis uncovers unexpected failure…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

autonomousvision/fail2drive
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.