Do Open-Loop Metrics Predict Closed-Loop Driving? A Cross-Benchmark Correlation Study of NAVSIM and Bench2Drive

Yiru Wang; Anqing Jiang; Shuo Wang; Yuwen Heng; Hai Yang; Yang Chen; Hao Sun

arXiv:2605.00066·cs.RO·May 4, 2026

Do Open-Loop Metrics Predict Closed-Loop Driving? A Cross-Benchmark Correlation Study of NAVSIM and Bench2Drive

Yiru Wang, Anqing Jiang, Shuo Wang, Yuwen Heng, Hai Yang, Yang Chen, Hao Sun

PDF

TL;DR

This study investigates whether recent open-loop metrics, especially NAVSIM v2, can reliably predict closed-loop autonomous driving performance, revealing nuanced correlations and the importance of specific sub-metrics.

Contribution

The paper provides a systematic cross-benchmark analysis showing that certain open-loop metrics, notably Ego Progress, strongly predict closed-loop success, and introduces a simplified predictive formula.

Findings

01

NAVSIM PDM Score correlates positively with closed-loop Driving Score but with ranking inversions.

02

Ego Progress (EP) is the strongest predictor among sub-metrics for closed-loop performance.

03

A simple 3-metric formula matches the predictive power of the full 5-metric PDMS with high correlation.

Abstract

Open-loop evaluation offers fast, reproducible assessment of autonomous driving planners, but its ability to predict real closed-loop driving performance remains questionable. Prior work has shown that traditional open-loop metrics such as Average Displacement Error (ADE) and Final Displacement Error (FDE) exhibit no reliable correlation with closed-loop Driving Score. In this paper, we ask whether the more recent, safety-aware open-loop metrics introduced by NAVSIM~v2 can bridge this gap. By systematically cross-referencing published results from 15 state-of-the-art methods across NAVSIM (open-loop) and Bench2Drive (closed-loop), we compile a paired dataset of open-loop sub-metrics and closed-loop performance, yielding 8 methods with complete paired data. Our analysis reveals three key findings: (1) the aggregate NAVSIM PDM Score shows a strong positive but non-monotonic correlation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.