Trusting Fair Data: Leveraging Quality in Fairness-Driven Data Removal   Techniques

Manh Khoi Duong; Stefan Conrad

arXiv:2405.12926·cs.LG·September 24, 2024

Trusting Fair Data: Leveraging Quality in Fairness-Driven Data Removal Techniques

Manh Khoi Duong, Stefan Conrad

PDF

Open Access

TL;DR

This paper introduces a multi-objective optimization approach to fair data removal that balances fairness with data retention, enhancing trustworthiness of bias mitigation techniques in machine learning.

Contribution

It proposes a novel multi-objective framework and Pareto-optimal solutions for fair data removal, addressing trustworthiness and data quality concerns.

Findings

01

Balances fairness and data retention effectively

02

Provides Pareto-optimal solutions for data subset selection

03

Distributed as a Python package for practical use

Abstract

In this paper, we deal with bias mitigation techniques that remove specific data points from the training set to aim for a fair representation of the population in that set. Machine learning models are trained on these pre-processed datasets, and their predictions are expected to be fair. However, such approaches may exclude relevant data, making the attained subsets less trustworthy for further usage. To enhance the trustworthiness of prior methods, we propose additional requirements and objectives that the subsets must fulfill in addition to fairness: (1) group coverage, and (2) minimal data loss. While removing entire groups may improve the measured fairness, this practice is very problematic as failing to represent every group cannot be considered fair. In our second concern, we advocate for the retention of data while minimizing discrimination. By introducing a multi-objective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Privacy, Security, and Data Protection · Cloud Data Security Solutions

MethodsSparse Evolutionary Training