Empirical Insights of Test Selection Metrics under Multiple Testing Objectives and Distribution Shifts

Jingyu Zhang; Fan Wang; Jacky Keung; Yihan Liao; Yan Xiao; Lei Ma

arXiv:2604.23342·cs.SE·April 28, 2026

Empirical Insights of Test Selection Metrics under Multiple Testing Objectives and Distribution Shifts

Jingyu Zhang, Fan Wang, Jacky Keung, Yihan Liao, Yan Xiao, Lei Ma

PDF

TL;DR

This paper provides a comprehensive empirical evaluation of 15 test selection metrics across multiple testing objectives, OOD scenarios, data modalities, and models, addressing gaps in prior research.

Contribution

It introduces a large-scale benchmark and analysis framework to assess metric effectiveness under diverse, realistic testing conditions for DL systems.

Findings

01

Metrics vary significantly in effectiveness across scenarios

02

Certain metrics perform well for fault detection but poorly for performance estimation

03

The study highlights the importance of context-specific metric selection

Abstract

Deep learning (DL)-based systems can exhibit unexpected behavior when exposed to out-of-distribution (OOD) scenarios, posing serious risks in safety-critical domains such as malware detection and autonomous driving. This underscores the importance of thoroughly testing such systems before deployment. To this end, researchers have proposed a wide range of test selection metrics designed to effectively select inputs. However, prior evaluations of metrics reveal three key limitations: (1) narrow testing objectives, for example, many studies assess metrics only for fault detection, leaving their effectiveness for performance estimation unclear; (2) limited coverage of OOD scenarios, with natural and label shifts are rarely considered; (3) Biased dataset selection, where most work focuses on image data while other modalities remain underexplored. Consequently, a unified benchmark that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.