Does Data Repair Lead to Fair Models? Curating Contextually Fair Data To Reduce Model Bias
Sharat Agarwal, Sumanyu Muku, Saket Anand, Chetan Arora

TL;DR
This paper introduces a data repair algorithm that curates fair, balanced datasets by reducing co-occurrence bias, leading to fairer model predictions across various tasks without sacrificing overall accuracy.
Contribution
The paper presents a novel, simple data repair method using the coefficient of variation to create contextually fair datasets for training unbiased models across different tasks.
Findings
Curating fair data improves model fairness across protected groups.
The method maintains overall model performance while reducing bias.
Applicable to various tasks and training scenarios.
Abstract
Contextual information is a valuable cue for Deep Neural Networks (DNNs) to learn better representations and improve accuracy. However, co-occurrence bias in the training dataset may hamper a DNN model's generalizability to unseen scenarios in the real world. For example, in COCO, many object categories have a much higher co-occurrence with men compared to women, which can bias a DNN's prediction in favor of men. Recent works have focused on task-specific training strategies to handle bias in such scenarios, but fixing the available data is often ignored. In this paper, we propose a novel and more generic solution to address the contextual bias in the datasets by selecting a subset of the samples, which is fair in terms of the co-occurrence with various classes for a protected attribute. We introduce a data repair algorithm using the coefficient of variation, which can curate fair and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Does Data Repair Lead to Fair Models? Curating Contextually Fair Data To Reduce Model Bias· youtube
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
MethodsRepair
