TL;DR
This paper addresses the challenge of sample selection bias in cross-validation by analyzing the variance of importance-weighted risk estimators and introducing a control variate to improve robustness against large weights.
Contribution
It provides a detailed analysis of importance-weighted risk estimator variance and proposes a novel control variate method to enhance robustness under sample selection bias.
Findings
Variance of importance-weighted risk estimator depends on data distribution.
Introducing a control variate reduces the impact of large weights.
Improved hyperparameter estimation in biased sampling scenarios.
Abstract
Cross-validation under sample selection bias can, in principle, be done by importance-weighting the empirical risk. However, the importance-weighted risk estimator produces sub-optimal hyperparameter estimates in problem settings where large weights arise with high probability. We study its sampling variance as a function of the training data distribution and introduce a control variate to increase its robustness to problematically large weights.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
