Analyzing Fairness of Neural Network Prediction via Counterfactual Dataset Generation

Brian Hyeongseok Kim; Jacqueline L. Mitchell; Chao Wang

arXiv:2602.10457·cs.LG·February 12, 2026

Analyzing Fairness of Neural Network Prediction via Counterfactual Dataset Generation

Brian Hyeongseok Kim, Jacqueline L. Mitchell, Chao Wang

PDF

Open Access

TL;DR

This paper introduces a novel method for assessing neural network fairness by generating counterfactual datasets through minimal label modifications in training data, enabling analysis of bias influence on predictions.

Contribution

It proposes a new approach to evaluate fairness by analyzing how label bias in training data affects neural network predictions via counterfactual dataset generation.

Findings

01

Efficiently identifies critical training labels influencing predictions.

02

Modifies only a small subset of labels to change model outputs.

03

Reveals connections between training data bias and test case predictions.

Abstract

Interpreting the inference-time behavior of deep neural networks remains a challenging problem. Existing approaches to counterfactual explanation typically ask: What is the closest alternative input that would alter the model's prediction in a desired way? In contrast, we explore counterfactual datasets. Rather than perturbing the input, our method efficiently finds the closest alternative training dataset, one that differs from the original dataset by changing a few labels. Training a new model on this altered dataset can then lead to a different prediction of a given test instance. This perspective provides a new way to assess fairness by directly analyzing the influence of label bias on training and inference. Our approach can be characterized as probing whether a given prediction depends on biased labels. Since exhaustively enumerating all possible alternate datasets is infeasible,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI