Feature-Weighted Maximum Representative Subsampling
Tony Hauptmann, Stefan Kramer

TL;DR
This paper introduces FW-MRS, a debiasing algorithm that uses feature weights to reduce the impact of biased features during subsampling, maintaining data representativeness without sacrificing downstream task performance.
Contribution
The paper presents FW-MRS, a novel feature-weighted subsampling method that improves debiasing by focusing less on biased features, validated on multiple datasets with no loss in downstream performance.
Findings
FW-MRS effectively reduces bias in datasets.
No significant performance loss in downstream tasks.
Applicable to real-world social science data.
Abstract
In the social sciences, it is often necessary to debias studies and surveys before valid conclusions can be drawn. Debiasing algorithms enable the computational removal of bias using sample weights. However, an issue arises when only a subset of features is highly biased, while the rest is already representative. Algorithms need to strongly alter the sample distribution to manage a few highly biased features, which can in turn introduce bias into already representative variables. To address this issue, we developed a method that uses feature weights to minimize the impact of highly biased features on the computation of sample weights. Our algorithm is based on Maximum Representative Subsampling (MRS), which debiases datasets by aligning a non-representative sample with a representative one through iterative removal of elements to create a representative subsample. The new algorithm,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Advanced Graph Neural Networks · Mobile Crowdsensing and Crowdsourcing
