Multi-characteristic Subject Selection from Biased Datasets
Tahereh Arabghalizi, Alexandros Labrinidis

TL;DR
This paper introduces a constrained optimization method for selecting diverse subjects from biased datasets, improving sampling accuracy in experimental studies involving human participants.
Contribution
It proposes a novel optimization-based approach for multi-characteristic subject selection that effectively mitigates bias in datasets.
Findings
Outperforms baseline methods by up to 90% in various experiments
Effective in selecting diverse subjects from biased datasets
Applicable to real-world experimental settings
Abstract
Subject selection plays a critical role in experimental studies, especially ones with human subjects. Anecdotal evidence suggests that many such studies, done at or near university campus settings suffer from selection bias, i.e., the too-many-college-kids-as-subjects problem. Unfortunately, traditional sampling techniques, when applied over biased data, will typically return biased results. In this paper, we tackle the problem of multi-characteristic subject selection from biased datasets. We present a constrained optimization-based method that finds the best possible sampling fractions for the different population subgroups, based on the desired sampling fractions provided by the researcher running the subject selection.We perform an extensive experimental study, using a variety of real datasets. Our results show that our proposed method outperforms the baselines for all problem…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Advanced Bandit Algorithms Research · Imbalanced Data Classification Techniques
