Beyond the Tip of the Iceberg: Assessing Coherence of Text Classifiers

Shane Storks; Joyce Chai

arXiv:2109.04922·cs.CL·September 13, 2021

Beyond the Tip of the Iceberg: Assessing Coherence of Text Classifiers

Shane Storks, Joyce Chai

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new evaluation framework measuring the coherence of predictions in text classifiers, providing deeper insights into model capabilities beyond traditional accuracy metrics.

Contribution

It proposes a simple yet effective method for assessing prediction coherence, applicable across different language understanding benchmarks.

Findings

01

The framework offers a quick and versatile way to evaluate prediction coherence.

02

It reveals insights into model behavior not captured by accuracy alone.

03

Demonstrated effectiveness across multiple benchmark datasets.

Abstract

As large-scale, pre-trained language models achieve human-level and superhuman accuracy on existing language understanding tasks, statistical bias in benchmark data and probing studies have recently called into question their true capabilities. For a more informative evaluation than accuracy on text classification tasks can offer, we propose evaluating systems through a novel measure of prediction coherence. We apply our framework to two existing language understanding benchmarks with different properties to demonstrate its versatility. Our experimental results show that this evaluation framework, although simple in ideas and implementation, is a quick, effective, and versatile measure to provide insight into the coherence of machines' predictions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sled-group/verifiable-coherent-nlu
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications