A study of pre-validation

Holger H\"ofling; Robert Tibshirani

arXiv:0807.4105·stat.AP·July 28, 2008

A study of pre-validation

Holger H\"ofling, Robert Tibshirani

PDF

TL;DR

This paper analyzes pre-validation for high-dimensional data predictors, revealing biases in standard tests and proposing a permutation test to improve inference validity, especially in microarray studies.

Contribution

It provides an analytical assessment of pre-validation, identifies bias in existing tests, and introduces a permutation test to enhance inference accuracy.

Findings

01

Pre-validation generally performs well.

02

Standard analytical tests can be biased.

03

Permutation test maintains nominal level and similar power.

Abstract

Given a predictor of outcome derived from a high-dimensional dataset, pre-validation is a useful technique for comparing it to competing predictors on the same dataset. For microarray data, it allows one to compare a newly derived predictor for disease outcome to standard clinical predictors on the same dataset. We study pre-validation analytically to determine if the inferences drawn from it are valid. We show that while pre-validation generally works well, the straightforward "one degree of freedom" analytical test from pre-validation can be biased and we propose a permutation test to remedy this problem. In simulation studies, we show that the permutation test has the nominal level and achieves roughly the same power as the analytical test.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.