Estimation of prediction error with known covariate shift

Hui Xu; Robert Tibshirani

arXiv:2205.01849·stat.ME·September 30, 2022

Estimation of prediction error with known covariate shift

Hui Xu, Robert Tibshirani

PDF

Open Access

TL;DR

This paper addresses the challenge of estimating prediction error under covariate shift, proposing a bootstrap-based method that outperforms traditional cross-validation in biased scenarios.

Contribution

It introduces a novel bootstrap approach for error estimation under covariate shift, improving accuracy when training and test distributions differ.

Findings

01

The proposed method outperforms cross-validation in simulations.

02

It provides more accurate error estimates under covariate shift.

03

Empirical results show improved model selection performance.

Abstract

In supervised learning, the estimation of prediction error on unlabeled test data is an important task. Existing methods are usually built on the assumption that the training and test data are sampled from the same distribution, which is often violated in practice. As a result, traditional estimators like cross-validation (CV) will be biased and this may result in poor model selection. In this paper, we assume that we have a test dataset in which the feature values are available but not the outcome labels, and focus on a particular form of distributional shift called "covariate shift". We propose an alternative method based on parametric bootstrap of the target of conditional error. Empirically, our method outperforms CV for both simulation and real data example across different modeling tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Gaussian Processes and Bayesian Inference · Domain Adaptation and Few-Shot Learning