When Can Memorization Improve Fairness?

Bob Pepin; Christian Igel; Raghavendra Selvan

arXiv:2412.09254·cs.LG·December 13, 2024

When Can Memorization Improve Fairness?

Bob Pepin, Christian Igel, Raghavendra Selvan

PDF

Open Access

TL;DR

This paper investigates how memorizing parts of a dataset can influence fairness metrics in multi-class classification, providing explicit bias expressions and conditions for bias elimination.

Contribution

It offers explicit formulas for bias due to memorization, characterizes datasets that eliminate bias, and bounds the data mass needed for fairness improvements.

Findings

01

Explicit bias expressions in terms of data distribution

02

Conditions for bias elimination through memorization

03

Bounds on dataset size for complete bias removal

Abstract

We study to which extent additive fairness metrics (statistical parity, equal opportunity and equalized odds) can be influenced in a multi-class classification problem by memorizing a subset of the population. We give explicit expressions for the bias resulting from memorization in terms of the label and group membership distribution of the memorized dataset and the classifier bias on the unmemorized dataset. We also characterize the memorized datasets that eliminate the bias for all three metrics considered. Finally we provide upper and lower bounds on the total probability mass in the memorized dataset that is necessary for the complete elimination of these biases.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Systems and Decision Making