DispaRisk: Auditing Fairness Through Usable Information

Jonathan Vasquez; Carlotta Domeniconi; Huzefa Rangwala

arXiv:2405.12372·cs.LG·June 2, 2025

DispaRisk: Auditing Fairness Through Usable Information

Jonathan Vasquez, Carlotta Domeniconi, Huzefa Rangwala

PDF

Open Access 1 Repo

TL;DR

DispaRisk is a new framework that uses usable information theory to proactively identify and assess potential biases in datasets and models early in the machine learning pipeline, aiming to improve fairness.

Contribution

It introduces DispaRisk, a novel early-stage bias risk assessment tool leveraging usable information theory, enhancing fairness in machine learning systems.

Findings

01

DispaRisk effectively identifies high-risk datasets for discrimination.

02

It detects model families prone to biases within ML pipelines.

03

The framework improves explainability of bias risks.

Abstract

Machine Learning algorithms (ML) impact virtually every aspect of human lives and have found use across diverse sectors including healthcare, finance, and education. Often, ML algorithms have been found to exacerbate societal biases present in datasets leading to adversarial impacts on subsets/groups of individuals and in many cases on minority groups. To effectively mitigate these untoward effects, it is crucial that disparities/biases are identified early in a ML pipeline. This proactive approach facilitates timely interventions to prevent bias amplification and reduce complexity at later stages of model development. In this paper, we leverage recent advancements in usable information theory to introduce DispaRisk, a novel framework designed to proactively assess the potential risks of disparities in datasets during the initial stages of the ML pipeline. We evaluate DispaRisk's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jovasque156/disparisk
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Data Quality and Management · Artificial Intelligence in Healthcare