Efficient, adaptive cross-validation for tuning and comparing models,   with application to drug discovery

Hui Shen; William J. Welch; Jacqueline M. Hughes-Oliver

arXiv:1202.6536·stat.AP·March 1, 2012

Efficient, adaptive cross-validation for tuning and comparing models, with application to drug discovery

Hui Shen, William J. Welch, Jacqueline M. Hughes-Oliver

PDF

TL;DR

This paper introduces an efficient, adaptive cross-validation method for model tuning and comparison, significantly reducing computational costs and improving reliability, especially in large-scale applications like drug discovery.

Contribution

It develops a sequential, adaptive CV approach that quickly eliminates poor models and accounts for CV randomness, enhancing model selection efficiency.

Findings

01

Reduces computational burden in large-scale CV comparisons.

02

Can establish model inferiority with minimal CV runs.

03

Improves reliability of model selection in drug discovery.

Abstract

Cross-validation (CV) is widely used for tuning a model with respect to user-selected parameters and for selecting a "best" model. For example, the method of $k$ -nearest neighbors requires the user to choose $k$ , the number of neighbors, and a neural network has several tuning parameters controlling the network complexity. Once such parameters are optimized for a particular data set, the next step is often to compare the various optimized models and choose the method with the best predictive performance. Both tuning and model selection boil down to comparing models, either across different values of the tuning parameters or across different classes of statistical models and/or sets of explanatory variables. For multiple large sets of data, like the PubChem drug discovery cheminformatics data which motivated this work, reliable CV comparisons are computationally demanding, or even…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.