Beyond the Tip of the Iceberg: Assessing Coherence of Text Classifiers
Shane Storks, Joyce Chai

TL;DR
This paper introduces a new evaluation framework measuring the coherence of predictions in text classifiers, providing deeper insights into model capabilities beyond traditional accuracy metrics.
Contribution
It proposes a simple yet effective method for assessing prediction coherence, applicable across different language understanding benchmarks.
Findings
The framework offers a quick and versatile way to evaluate prediction coherence.
It reveals insights into model behavior not captured by accuracy alone.
Demonstrated effectiveness across multiple benchmark datasets.
Abstract
As large-scale, pre-trained language models achieve human-level and superhuman accuracy on existing language understanding tasks, statistical bias in benchmark data and probing studies have recently called into question their true capabilities. For a more informative evaluation than accuracy on text classification tasks can offer, we propose evaluating systems through a novel measure of prediction coherence. We apply our framework to two existing language understanding benchmarks with different properties to demonstrate its versatility. Our experimental results show that this evaluation framework, although simple in ideas and implementation, is a quick, effective, and versatile measure to provide insight into the coherence of machines' predictions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
