Prediction Aided by Surrogate Training

Eric Xia; Martin J. Wainwright

arXiv:2412.09364·math.ST·December 13, 2024

Prediction Aided by Surrogate Training

Eric Xia, Martin J. Wainwright

PDF

Open Access

TL;DR

This paper introduces PAST, a method that leverages helper covariates during training to improve prediction accuracy using only standard covariates at test time, with theoretical guarantees and empirical validation.

Contribution

The paper proposes PAST, a novel framework that constructs response estimators using helper covariates to enhance predictive models trained solely on standard covariates, with theoretical error bounds.

Findings

01

Theoretical guarantees on prediction error bounds for PAST.

02

Empirical improvements demonstrated across diverse applications.

03

Characterization of regimes where PAST approaches oracle accuracy.

Abstract

We study a class of prediction problems in which relatively few observations have associated responses, but all observations include both standard covariates as well as additional "helper" covariates. While the end goal is to make high-quality predictions using only the standard covariates, helper covariates can be exploited during training to improve prediction. Helper covariates arise in many applications, including forecasting in time series; incorporation of biased or mis-calibrated predictions from foundation models; and sharing information in transfer learning. We propose "prediction aided by surrogate training" ( $PAST$ ), a class of methods that exploit labeled data to construct a response estimator based on both the standard and helper covariates; and then use the full dataset with pseudo-responses to train a predictor based only on standard covariates. We establish…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification