Correcting Underrepresentation and Intersectional Bias for   Classification

Emily Diana; Alexander Williams Tolbert

arXiv:2306.11112·cs.LG·June 5, 2024·2 cites

Correcting Underrepresentation and Intersectional Bias for Classification

Emily Diana, Alexander Williams Tolbert

PDF

Open Access

TL;DR

This paper introduces a method to correct underrepresentation and intersectional bias in classification tasks by estimating group-wise dropout rates with limited unbiased data, enabling more accurate learning from biased datasets.

Contribution

The authors propose a novel reweighting scheme and algorithm that efficiently estimates intersectional dropout rates and corrects bias, extending PAC learnability to biased data settings.

Findings

01

Effective estimation of group-wise dropout rates with minimal unbiased data

02

A reweighting scheme that approximates true distribution loss from biased samples

03

Algorithm enabling efficient learning under intersectional bias constraints

Abstract

We consider the problem of learning from data corrupted by underrepresentation bias, where positive examples are filtered from the data at different, unknown rates for a fixed number of sensitive groups. We show that with a small amount of unbiased data, we can efficiently estimate the group-wise drop-out rates, even in settings where intersectional group membership makes learning each intersectional rate computationally infeasible. Using these estimates, we construct a reweighting scheme that allows us to approximate the loss of any hypothesis on the true distribution, even if we only observe the empirical error on a biased sample. From this, we present an algorithm encapsulating this learning and reweighting process along with a thorough empirical investigation. Finally, we define a bespoke notion of PAC learnability for the underrepresentation and intersectional bias setting and show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHealthcare cost, quality, practices · Advanced Causal Inference Techniques · Privacy-Preserving Technologies in Data