Distributional bias compromises leave-one-out cross-validation

George I. Austin; Itsik Pe'er; Tal Korem

arXiv:2406.01652·stat.ME·March 25, 2025

Distributional bias compromises leave-one-out cross-validation

George I. Austin, Itsik Pe'er, Tal Korem

PDF

1 Repo

TL;DR

This paper identifies a distributional bias in leave-one-out cross-validation that negatively affects performance estimates and proposes a rebalanced method to correct this bias, improving evaluation accuracy.

Contribution

The paper reveals a distributional bias in leave-one-out cross-validation and introduces a rebalanced approach to mitigate its impact on performance evaluation.

Findings

01

Distributional bias causes negative correlation between training and test labels.

02

The bias affects hyperparameter tuning and model assessment.

03

Rebalanced cross-validation improves evaluation accuracy in various scenarios.

Abstract

Cross-validation is a common method for estimating the predictive performance of machine learning models. In a data-scarce regime, where one typically wishes to maximize the number of instances used for training the model, an approach called "leave-one-out cross-validation" is often used. In this design, a separate model is built for predicting each data instance after training on all other instances. Since this results in a single test instance available per model trained, predictions are aggregated across the entire dataset to calculate common performance metrics such as the area under the receiver operating characteristic or R2 scores. In this work, we demonstrate that this approach creates a negative correlation between the average label of each training fold and the label of its corresponding test instance, a phenomenon that we term distributional bias. As machine learning models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

korem-lab/rebalancedcv
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.