Correcting for selection bias via cross-validation in the classification   of microarray data

G. J. McLachlan; J. Chevelu; J. Zhu

arXiv:0805.2501·math.ST·December 18, 2008

Correcting for selection bias via cross-validation in the classification of microarray data

G. J. McLachlan, J. Chevelu, J. Zhu

PDF

TL;DR

This paper addresses the issue of selection bias in microarray data classification by proposing explicit formulas for proper cross-validation, ensuring accurate error rate estimation in diagnostic rule development.

Contribution

It introduces explicit formulas to correct for selection bias during cross-validation in microarray data classification, improving error rate estimation accuracy.

Findings

01

Explicit formulas for bias correction in cross-validation

02

Clarification of validation layers to avoid improper estimates

03

Enhanced accuracy in error rate estimation for diagnostic rules

Abstract

There is increasing interest in the use of diagnostic rules based on microarray data. These rules are formed by considering the expression levels of thousands of genes in tissue samples taken on patients of known classification with respect to a number of classes, representing, say, disease status or treatment strategy. As the final versions of these rules are usually based on a small subset of the available genes, there is a selection bias that has to be corrected for in the estimation of the associated error rates. We consider the problem using cross-validation. In particular, we present explicit formulae that are useful in explaining the layers of validation that have to be performed in order to avoid improperly cross-validated estimates.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.