Discover, Explanation, Improvement: An Automatic Slice Detection Framework for Natural Language Processing
Wenyue Hua, Lifeng Jin, Linfeng Song, Haitao Mi, Yongfeng Zhang, Dong, Yu

TL;DR
This paper introduces DEIM, a benchmark for slice detection in NLP, and Edisa, a new SDM that identifies underperforming data groups, helping improve model performance without retraining.
Contribution
The paper presents DEIM and Edisa, pioneering benchmarks and methods for automatic slice detection in NLP, enabling better understanding and improvement of models.
Findings
Edisa accurately identifies error-prone data slices with semantic coherence.
Detecting difficult data slices directly improves NLP model performance.
DEIM provides comprehensive evaluation metrics for SDMs in NLP.
Abstract
Pretrained natural language processing (NLP) models have achieved high overall performance, but they still make systematic errors. Instead of manual error analysis, research on slice detection models (SDM), which automatically identify underperforming groups of datapoints, has caught escalated attention in Computer Vision for both understanding model behaviors and providing insights for future model training and designing. However, little research on SDM and quantitative evaluation of their effectiveness have been conducted on NLP tasks. Our paper fills the gap by proposing a benchmark named "Discover, Explain, Improve (DEIM)" for classification NLP tasks along with a new SDM Edisa. Edisa discovers coherent and underperforming groups of datapoints; DEIM then unites them under human-understandable concepts and provides comprehensive evaluation tasks and corresponding quantitative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Software Engineering Research · Natural Language Processing Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Residual Connection · Dropout · WordPiece · Attention Dropout · Weight Decay · RoBERTa
