DISCERN: Decoding Systematic Errors in Natural Language for Text Classifiers
Rakesh R. Menon, Shashank Srivastava

TL;DR
DISCERN is a framework that uses language explanations generated by large language models to interpret, identify, and mitigate systematic biases in text classifiers, leading to improved performance and better human interpretability.
Contribution
We introduce DISCERN, a novel method that employs language explanations and an interactive loop between language models to interpret and reduce systematic biases in text classifiers.
Findings
Language explanations improve classifier performance beyond bias exemplars.
Users interpret systematic biases more effectively with language explanations.
Our framework achieves consistent performance gains across multiple datasets.
Abstract
Despite their high predictive accuracies, current machine learning systems often exhibit systematic biases stemming from annotation artifacts or insufficient support for certain classes in the dataset. Recent work proposes automatic methods for identifying and explaining systematic biases using keywords. We introduce DISCERN, a framework for interpreting systematic biases in text classifiers using language explanations. DISCERN iteratively generates precise natural language descriptions of systematic errors by employing an interactive loop between two large language models. Finally, we use the descriptions to improve classifiers by augmenting classifier training sets with synthetically generated instances or annotated examples via active learning. On three text-classification datasets, we demonstrate that language explanations from our framework induce consistent performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
