Towards Formalizing Spuriousness of Biased Datasets Using Partial Information Decomposition
Barproda Halder, Faisal Hamman, Pasan Dissanayake, Qiuyi Zhang, Ilia Sucholutsky, Sanghamitra Dutta

TL;DR
This paper introduces a novel information-theoretic framework using Partial Information Decomposition to identify and quantify spurious associations in datasets, aiding in understanding and mitigating dataset bias before model training.
Contribution
It proposes a new explainability framework and a spuriousness measure based on PID, enabling preemptive analysis of dataset biases and their impact on model training.
Findings
The framework effectively decomposes information into core, spurious, redundant, and synergistic components.
The proposed spuriousness measure correlates with model generalization performance.
Empirical results on benchmark datasets validate the framework's ability to anticipate dataset bias.
Abstract
Spuriousness arises when there is an association between two or more variables in a dataset that are not causally related. In this work, we propose an explainability framework to preemptively disentangle the nature of such spurious associations in a dataset before model training. We leverage a body of work in information theory called Partial Information Decomposition (PID) to decompose the total information about the target into four non-negative quantities, namely unique information (in core and spurious features, respectively), redundant information, and synergistic information. Our framework helps anticipate when the core or spurious feature is indispensable, when either suffices, and when both are jointly needed for an optimal classifier trained on the dataset. Next, we leverage this decomposition to propose a novel measure of the spuriousness of a dataset. We arrive at this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
