Fast and Accurate Importance Weighting for Correcting Sample Bias
Antoine de Mathelin, Francois Deheeger, Mathilde Mougeot, Nicolas, Vayatis

TL;DR
This paper introduces a neural network-based importance weighting method that efficiently corrects sample bias in large datasets, significantly reducing computation time while maintaining accuracy.
Contribution
A novel importance weighting algorithm using neural networks that scales efficiently to large datasets, outperforming existing methods in speed without sacrificing correction quality.
Findings
Reduces computational time on large datasets by orders of magnitude.
Maintains comparable bias correction performance to state-of-the-art methods.
Effective on datasets with up to two million data points.
Abstract
Bias in datasets can be very detrimental for appropriate statistical estimation. In response to this problem, importance weighting methods have been developed to match any biased distribution to its corresponding target unbiased distribution. The seminal Kernel Mean Matching (KMM) method is, nowadays, still considered as state of the art in this research field. However, one of the main drawbacks of this method is the computational burden for large datasets. Building on previous works by Huang et al. (2007) and de Mathelin et al. (2021), we derive a novel importance weighting algorithm which scales to large datasets by using a neural network to predict the instance weights. We show, on multiple public datasets, under various sample biases, that our proposed approach drastically reduces the computational time on large dataset while maintaining similar sample bias correction performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference
