Beyond Internal Data: Constructing Complete Datasets for Fairness Testing
Varsha Ramineni, Hossein A. Rahmani, Emine Yilmaz, David Barber

TL;DR
This paper introduces a method to create synthetic complete datasets from overlapping data sources to enable fair AI testing when real demographic data is inaccessible, ensuring consistent fairness metrics.
Contribution
It proposes a novel approach to construct synthetic datasets from partial data, facilitating fairness testing without requiring access to sensitive demographic information.
Findings
Synthetic data closely matches real data in fairness assessments
Fairness metrics from synthetic data are consistent with real data
Method enables independent fairness testing in data-restricted environments
Abstract
As AI becomes prevalent in high-risk domains and decision-making, it is essential to test for potential harms and biases. This urgency is reflected by the global emergence of AI regulations that emphasise fairness and adequate testing, with some mandating independent bias audits. However, procuring the necessary data for fairness testing remains a significant challenge. Particularly in industry settings, legal and privacy concerns restrict the collection of demographic data required to assess group disparities, and auditors face practical and cultural challenges in gaining access to data. Further, internal historical datasets are often insufficiently representative to identify real-world biases. This work focuses on evaluating classifier fairness when complete datasets including demographics are inaccessible. We propose leveraging separate overlapping datasets to construct complete…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI) · Qualitative Comparative Analysis Research
