Test Error Estimation after Model Selection Using Validation Error

Leying Guan

arXiv:1801.02817·stat.ME·February 13, 2018·2 cites

Test Error Estimation after Model Selection Using Validation Error

Leying Guan

PDF

Open Access

TL;DR

This paper introduces two simple, computationally efficient methods to accurately estimate test error after model selection, correcting bias from validation error, and provides confidence intervals using bootstrap.

Contribution

It proposes novel bias-corrected test error estimation methods that do not require model refitting and are applicable after model selection via validation error.

Findings

01

Proposed methods have biases of size o(1/√n) under certain assumptions.

02

Bootstrap confidence intervals effectively quantify uncertainty.

03

Methods perform well in simulations, demonstrating practical utility.

Abstract

When performing supervised learning with the model selected using validation error from sample splitting and cross validation, the minimum value of the validation error can be biased downward. We propose two simple methods that use the errors produced in the validating step to estimate the test error after model selection, and we focus on the situations where we select the model by minimizing the validation error and the randomized validation error. Our methods do not require model refitting, and the additional computational cost is negligible. In the setting of sample splitting, we show that, the proposed test error estimates have biases of size $o (1/ n)$ under suitable assumptions. We also propose to use the bootstrap to construct confidence intervals for the test error based on this result. We apply our proposed methods to a number of simulations and examine their performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Machine Learning and Algorithms · Gaussian Processes and Bayesian Inference