Robust importance-weighted cross-validation under sample selection bias

Wouter M. Kouw; Jesse H. Krijthe; Marco Loog

arXiv:1710.06514·cs.LG·August 28, 2019

Robust importance-weighted cross-validation under sample selection bias

Wouter M. Kouw, Jesse H. Krijthe, Marco Loog

PDF

1 Repo

TL;DR

This paper addresses the challenge of sample selection bias in cross-validation by analyzing the variance of importance-weighted risk estimators and introducing a control variate to improve robustness against large weights.

Contribution

It provides a detailed analysis of importance-weighted risk estimator variance and proposes a novel control variate method to enhance robustness under sample selection bias.

Findings

01

Variance of importance-weighted risk estimator depends on data distribution.

02

Introducing a control variate reduces the impact of large weights.

03

Improved hyperparameter estimation in biased sampling scenarios.

Abstract

Cross-validation under sample selection bias can, in principle, be done by importance-weighting the empirical risk. However, the importance-weighted risk estimator produces sub-optimal hyperparameter estimates in problem settings where large weights arise with high probability. We study its sampling variance as a function of the training data distribution and introduce a control variate to increase its robustness to problematically large weights.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wmkouw/ctrl-iwxval
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.