Efficient and Multiply Robust Risk Estimation under General Forms of   Dataset Shift

Hongxiang Qiu; Eric Tchetgen Tchetgen; Edgar Dobriban

arXiv:2306.16406·stat.ME·June 11, 2024

Efficient and Multiply Robust Risk Estimation under General Forms of Dataset Shift

Hongxiang Qiu, Eric Tchetgen Tchetgen, Edgar Dobriban

PDF

Open Access

TL;DR

This paper develops efficient, robust methods for risk estimation in machine learning under various dataset shift scenarios, including covariate, label, and concept shifts, using semiparametric theory and supporting simulations.

Contribution

It introduces a unified framework for risk estimation under general dataset shifts, with new estimators, robustness properties, and tests for shift conditions.

Findings

01

Proposed efficient, multiply robust estimators for risk under dataset shift.

02

Derived efficiency bounds for posterior drift and location-scale shift.

03

Simulation results show improved accuracy when leveraging dataset shift assumptions.

Abstract

Statistical machine learning methods often face the challenge of limited data available from the population of interest. One remedy is to leverage data from auxiliary source populations, which share some conditional distributions or are linked in other ways with the target domain. Techniques leveraging such \emph{dataset shift} conditions are known as \emph{domain adaptation} or \emph{transfer learning}. Despite extensive literature on dataset shift, limited works address how to efficiently use the auxiliary populations to improve the accuracy of risk evaluation for a given machine learning task in the target population. In this paper, we study the general problem of efficiently estimating target population risk under various dataset shift conditions, leveraging semiparametric efficiency theory. We consider a general class of dataset shift conditions, which includes three popular…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Gaussian Processes and Bayesian Inference · Machine Learning in Healthcare