Bench2Drive-Robust: Benchmarking Closed-Loop Autonomous Driving under Deployment Perturbations

Zhiyuan Zhang; Zhenghao Jin; Yanlun Peng; Xianda Guo; Haoran Liu; Shaofeng Zhang; Xingjun Ma; Zuxuan Wu; Junchi Yan; Xiaosong Jia; and Yu-Gang Jiang

arXiv:2605.18059·cs.RO·May 19, 2026

Bench2Drive-Robust: Benchmarking Closed-Loop Autonomous Driving under Deployment Perturbations

Zhiyuan Zhang, Zhenghao Jin, Yanlun Peng, Xianda Guo, Haoran Liu, Shaofeng Zhang, Xingjun Ma, Zuxuan Wu, Junchi Yan, Xiaosong Jia, and Yu-Gang Jiang

PDF

TL;DR

Bench2Drive-Robust introduces a device-centric benchmark for evaluating the robustness of closed-loop autonomous driving systems under realistic deployment perturbations, highlighting challenges not captured by traditional image corruption tests.

Contribution

It is the first to systematically evaluate system-level deployment perturbations in closed-loop autonomous driving, emphasizing their impact on robustness and stability.

Findings

01

Deployment perturbations significantly degrade driving performance.

02

Traditional image-level tests do not fully capture real-world robustness challenges.

03

Benchmark encourages development of deployment-aware robust autonomous driving systems.

Abstract

Robustness is a critical requirement for deploying autonomous driving systems in the real world. Existing robustness benchmarks for autonomous driving have made important progress in studying the effects of image-level corruptions, such as adverse weather or camera degradation, on perception modules and open-loop planning outputs. However, deployment can also involve system-level imperfections, such as inference latency and ego-state estimation errors, which remain less studied in closed-loop E2E-AD evaluation. These imperfections can accumulate through the feedback loop and destabilize control. In this work, we present Bench2Drive-Robust, to our knowledge the first device-centric robustness benchmark for closed-loop end-to-end autonomous driving under realistic deployment perturbations. We systematically evaluate deployment-oriented perturbations arising from three major sources:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.