Estimating prediction error for complex samples

Andrew Holbrook; Thomas Lumley; Daniel Gillen

arXiv:1711.04877·stat.ME·September 17, 2019

Estimating prediction error for complex samples

Andrew Holbrook, Thomas Lumley, Daniel Gillen

PDF

TL;DR

This paper extends Efron's covariance penalty estimator to complex survey samples using Horvitz-Thompson weights, enabling accurate estimation of prediction error in non-representative data contexts.

Contribution

It introduces the Horvitz-Thompson-Efron (HTE) estimator, adapting Efron's method for complex samples and demonstrating its consistency and broader applicability.

Findings

01

HTE estimator is consistent for true generalization error

02

Simulation studies validate the estimator's performance

03

Application to NHANES data illustrates practical utility

Abstract

With a growing interest in using non-representative samples to train prediction models for numerous outcomes it is necessary to account for the sampling design that gives rise to the data in order to assess the generalized predictive utility of a proposed prediction rule. After learning a prediction rule based on a non-uniform sample, it is of interest to estimate the rule's error rate when applied to unobserved members of the population. Efron (1986) proposed a general class of covariance penalty inflated prediction error estimators that assume the available training data are representative of the target population for which the prediction rule is to be applied. We extend Efron's estimator to the complex sample context by incorporating Horvitz-Thompson sampling weights and show that it is consistent for the true generalization error rate when applied to the underlying superpopulation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.