DISCO: DISCovering Overfittings as Causal Rules for Text Classification Models
Zijian Zhang, Vinay Setty, Yumeng Wang, Avishek Anand

TL;DR
DISCO is a novel method that uncovers causal, rule-based explanations for text classification models, revealing overfitting and spurious correlations to improve interpretability and model robustness.
Contribution
It introduces a scalable sequence mining approach to discover global, causal n-gram rules that explain model predictions and detect overfitting, surpassing existing interpretability methods.
Findings
Achieved 100% detection of manually inserted shortcuts in training data.
Identified an 18.8% performance regression due to overfitting.
Enabled interactive explanations to distinguish spurious from genuine features.
Abstract
With the rapid advancement of neural language models, the deployment of over-parameterized models has surged, increasing the need for interpretable explanations comprehensible to human inspectors. Existing post-hoc interpretability methods, which often focus on unigram features of single input textual instances, fail to capture the models' decision-making process fully. Additionally, many methods do not differentiate between decisions based on spurious correlations and those based on a holistic understanding of the input. Our paper introduces DISCO, a novel method for discovering global, rule-based explanations by identifying causal n-gram associations with model predictions. This method employs a scalable sequence mining technique to extract relevant text spans from training data, associate them with model predictions, and conduct causality checks to distill robust rules that elucidate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods
MethodsFocus
