DispaRisk: Auditing Fairness Through Usable Information
Jonathan Vasquez, Carlotta Domeniconi, Huzefa Rangwala

TL;DR
DispaRisk is a new framework that uses usable information theory to proactively identify and assess potential biases in datasets and models early in the machine learning pipeline, aiming to improve fairness.
Contribution
It introduces DispaRisk, a novel early-stage bias risk assessment tool leveraging usable information theory, enhancing fairness in machine learning systems.
Findings
DispaRisk effectively identifies high-risk datasets for discrimination.
It detects model families prone to biases within ML pipelines.
The framework improves explainability of bias risks.
Abstract
Machine Learning algorithms (ML) impact virtually every aspect of human lives and have found use across diverse sectors including healthcare, finance, and education. Often, ML algorithms have been found to exacerbate societal biases present in datasets leading to adversarial impacts on subsets/groups of individuals and in many cases on minority groups. To effectively mitigate these untoward effects, it is crucial that disparities/biases are identified early in a ML pipeline. This proactive approach facilitates timely interventions to prevent bias amplification and reduce complexity at later stages of model development. In this paper, we leverage recent advancements in usable information theory to introduce DispaRisk, a novel framework designed to proactively assess the potential risks of disparities in datasets during the initial stages of the ML pipeline. We evaluate DispaRisk's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Data Quality and Management · Artificial Intelligence in Healthcare
