# Cross-Validation for Correlated Data

**Authors:** Assaf Rabinowicz, Saharon Rosset

arXiv: 1904.02438 · 2021-08-10

## TL;DR

This paper examines the limitations of standard cross-validation with correlated data and proposes a bias-corrected estimator, $CV_c$, to improve model evaluation and selection in such settings.

## Contribution

It introduces a criterion for when standard CV is appropriate for correlated data and develops a bias correction method, $CV_c$, for cases where it is not.

## Key findings

- Standard CV can be biased with correlated data.
- The proposed $CV_c$ estimator reduces bias in prediction error estimation.
- Numerical experiments show improved model evaluation and selection using $CV_c$.

## Abstract

K-fold cross-validation (CV) with squared error loss is widely used for evaluating predictive models, especially when strong distributional assumptions cannot be taken. However, CV with squared error loss is not free from distributional assumptions, in particular in cases involving non-i.i.d. data. This paper analyzes CV for correlated data. We present a criterion for suitability of standard CV in presence of correlations. When this criterion does not hold, we introduce a bias corrected cross-validation estimator which we term $CV_c,$ that yields an unbiased estimate of prediction error in many settings where standard CV is invalid. We also demonstrate our results numerically, and find that introducing our correction substantially improves both, model evaluation and model selection in simulations and real data studies.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.02438/full.md

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/1904.02438/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/1904.02438/full.md

---
Source: https://tomesphere.com/paper/1904.02438