We Need to Talk About Classification Evaluation Metrics in NLP
Peter Vickers, Lo\"ic Barrault, Emilio Monti, Nikolaos Aletras

TL;DR
This paper critically examines various classification evaluation metrics in NLP, highlighting the importance of metric choice and proposing Informedness as a robust baseline for assessing model performance across diverse NLP tasks.
Contribution
It introduces Informedness as a comprehensive metric and demonstrates its effectiveness compared to traditional metrics through extensive NLP experiments.
Findings
Informedness outperforms traditional metrics in capturing model quality.
The choice of evaluation metric significantly impacts model ranking.
A Python implementation of Informedness is provided for practical use.
Abstract
In Natural Language Processing (NLP) classification tasks such as topic categorisation and sentiment analysis, model generalizability is generally measured with standard metrics such as Accuracy, F-Measure, or AUC-ROC. The diversity of metrics, and the arbitrariness of their application suggest that there is no agreement within NLP on a single best metric to use. This lack suggests there has not been sufficient examination of the underlying heuristics which each metric encodes. To address this we compare several standard classification metrics with more 'exotic' metrics and demonstrate that a random-guess normalised Informedness metric is a parsimonious baseline for task performance. To show how important the choice of metric is, we perform extensive experiments on a wide range of NLP tasks including a synthetic scenario, natural language understanding, question answering and machine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Software Engineering Research · Natural Language Processing Techniques
