Learning from Discriminatory Training Data

Przemyslaw A. Grabowicz; Nicholas Perello; Kenta Takatsu

arXiv:1912.08189·cs.LG·January 22, 2026·1 cites

Learning from Discriminatory Training Data

Przemyslaw A. Grabowicz, Nicholas Perello, Kenta Takatsu

PDF

Open Access

TL;DR

This paper introduces a fair learning method that minimizes model error on fair datasets despite training on potentially discriminatory data, using probabilistic interventions and causal formulations, to prevent discrimination while maintaining accuracy.

Contribution

It proposes a novel, computationally lightweight fair learning approach that addresses direct and indirect discrimination through probabilistic interventions and causal reasoning.

Findings

01

Method provably minimizes error on fair datasets

02

Compatible with existing supervised models

03

Balances fairness with model accuracy

Abstract

Supervised learning systems are trained using historical data and, if the data was tainted by discrimination, they may unintentionally learn to discriminate against protected groups. We propose that fair learning methods, despite training on potentially discriminatory datasets, shall perform well on fair test datasets. Such dataset shifts crystallize application scenarios for specific fair learning methods. For instance, the removal of direct discrimination can be represented as a particular dataset shift problem. For this scenario, we propose a learning method that provably minimizes model error on fair datasets, while blindly training on datasets poisoned with direct additive discrimination. The method is compatible with existing legal systems and provides a solution to the widely discussed issue of protected groups' intersectionality by striking a balance between the protected…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Explainable Artificial Intelligence (XAI)

MethodsTest