On the Cause of Unfairness: A Training Sample Perspective
Yuanshun Yao, Yang Liu

TL;DR
This paper investigates the root causes of model unfairness by analyzing how changes in training data attributes affect unfairness, providing tools for understanding, mitigating, and repairing unfair models.
Contribution
It introduces a counterfactual data influence framework to analyze and address unfairness caused by training samples, enabling targeted data repairs and detection of biases.
Findings
Quantifies how training data modifications impact model unfairness
Enables detection of mislabeled or biased training samples
Supports data repair to mitigate unfairness
Abstract
Identifying the causes of a model's unfairness is an important yet relatively unexplored task. We look into this problem through the lens of training data - the major source of unfairness. We ask the following questions: How would the unfairness of a model change if its training samples (1) were collected from a different (e.g. demographic) group, (2) were labeled differently, or (3) whose features were modified? In other words, we quantify the influence of training samples on unfairness by counterfactually changing samples based on predefined concepts, i.e. data attributes such as features, labels, and sensitive attributes. Our framework not only can help practitioners understand the observed unfairness and mitigate it by repairing their training data, but also leads to many other applications, e.g. detecting mislabeling, fixing imbalanced representations, and detecting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
MethodsRepair
