Balanced Filtering via Disclosure-Controlled Proxies
Siqi Deng, Emily Diana, Michael Kearns, and Aaron Roth

TL;DR
This paper introduces a method for collecting balanced datasets with respect to sensitive groups by using a disclosure-controlled proxy function that limits information about individual group membership, ensuring privacy and fairness.
Contribution
We propose a novel proxy-based filtering mechanism that controls information disclosure about sensitive groups while maintaining sample efficiency and fairness in data collection.
Findings
The proxy function effectively balances group representation.
The method limits information disclosure to population base rates.
Experimental results demonstrate practical effectiveness.
Abstract
We study the problem of collecting a cohort or set that is balanced with respect to sensitive groups when group membership is unavailable or prohibited from use at deployment time. Specifically, our deployment-time collection mechanism does not reveal significantly more about the group membership of any individual sample than can be ascertained from base rates alone. To do this, we study a learner that can use a small set of labeled data to train a proxy function that can later be used for this filtering or selection task. We then associate the range of the proxy function with sampling probabilities; given a new example, we classify it using our proxy function and then select it with probability corresponding to its proxy classification. Importantly, we require that the proxy classification does not reveal significantly more information about the sensitive group membership of any…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsBalanced Selection
