A study of pre-validation
Holger H\"ofling, Robert Tibshirani

TL;DR
This paper analyzes pre-validation for high-dimensional data predictors, revealing biases in standard tests and proposing a permutation test to improve inference validity, especially in microarray studies.
Contribution
It provides an analytical assessment of pre-validation, identifies bias in existing tests, and introduces a permutation test to enhance inference accuracy.
Findings
Pre-validation generally performs well.
Standard analytical tests can be biased.
Permutation test maintains nominal level and similar power.
Abstract
Given a predictor of outcome derived from a high-dimensional dataset, pre-validation is a useful technique for comparing it to competing predictors on the same dataset. For microarray data, it allows one to compare a newly derived predictor for disease outcome to standard clinical predictors on the same dataset. We study pre-validation analytically to determine if the inferences drawn from it are valid. We show that while pre-validation generally works well, the straightforward "one degree of freedom" analytical test from pre-validation can be biased and we propose a permutation test to remedy this problem. In simulation studies, we show that the permutation test has the nominal level and achieves roughly the same power as the analytical test.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
