Overcoming Representation Bias in Fairness-Aware data Repair using Optimal Transport

Abigail Langbridge; Anthony Quinn; Robert Shorten

arXiv:2410.02840·cs.LG·March 11, 2026

Overcoming Representation Bias in Fairness-Aware data Repair using Optimal Transport

Abigail Langbridge, Anthony Quinn, Robert Shorten

PDF

TL;DR

This paper introduces a Bayesian nonparametric approach to improve fairness in data repair using optimal transport, effectively addressing representation bias and enabling out-of-sample data correction.

Contribution

It proposes a novel Bayesian nonparametric stopping rule for learning OT operators, enhancing fairness and bias tolerance in data repair.

Findings

01

Effective bias mitigation demonstrated on benchmark datasets

02

Out-of-sample data repair capability improved

03

Trade-offs between fairness and data integrity established

Abstract

Optimal transport (OT) has an important role in transforming data distributions in a manner which engenders fairness. Typically, the OT operators are learnt from the unfair attribute-labelled data, and then used for their repair. Two significant limitations of this approach are as follows: (i) the OT operators for underrepresented subgroups are poorly learnt (i.e. they are susceptible to representation bias); and (ii) these OT repairs cannot be effected on identically distributed but out-of-sample (i.e.\ archival) data. In this paper, we address both of these problems by adopting a Bayesian nonparametric stopping rule for learning each attribute-labelled component of the data distribution. The induced OT-optimal quantization operators can then be used to repair the archival data. We formulate a novel definition of the fair distributional target, along with quantifiers that allow us to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.