ODP-Bench: Benchmarking Out-of-Distribution Performance Prediction
Han Yu, Kehan Li, Dongbai Li, Yue He, Xingxuan Zhang, Peng Cui

TL;DR
ODP-Bench offers a comprehensive and standardized benchmark for evaluating out-of-distribution performance prediction algorithms across diverse datasets, facilitating fair comparisons and advancing research in this critical area.
Contribution
This work introduces ODP-Bench, a unified benchmark with trained models and diverse datasets, addressing evaluation inconsistencies and enabling systematic assessment of OOD performance prediction methods.
Findings
Benchmark covers most common OOD datasets
Provides trained models for consistent evaluation
Experimental analysis reveals performance boundaries
Abstract
Recently, there has been gradually more attention paid to Out-of-Distribution (OOD) performance prediction, whose goal is to predict the performance of trained models on unlabeled OOD test datasets, so that we could better leverage and deploy off-the-shelf trained models in risk-sensitive scenarios. Although progress has been made in this area, evaluation protocols in previous literature are inconsistent, and most works cover only a limited number of real-world OOD datasets and types of distribution shifts. To provide convenient and fair comparisons for various algorithms, we propose Out-of-Distribution Performance Prediction Benchmark (ODP-Bench), a comprehensive benchmark that includes most commonly used OOD datasets and existing practical performance prediction algorithms. We provide our trained models as a testbench for future researchers, thus guaranteeing the consistency of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Video Quality Assessment · Software System Performance and Reliability · Imbalanced Data Classification Techniques
