Towards a Diagnostic and Predictive Evaluation Methodology for Sequence Labeling Tasks
Elena Alvarez-Mellado, Julio Gonzalo

TL;DR
This paper introduces a new evaluation methodology for sequence labeling in NLP that uses handcrafted, linguistically motivated test sets to diagnose weaknesses, guide improvements, and predict model performance on unseen data.
Contribution
It proposes a diagnostic evaluation approach based on error analysis and small, targeted test sets, moving beyond traditional aggregate metrics.
Findings
Provides a diagnostic view of system weaknesses
Predicts external dataset performance with median correlation of 0.85
Demonstrates methodology on anglicism identification in Spanish
Abstract
Standard evaluation in NLP typically indicates that system A is better on average than system B, but it provides little info on how to improve performance and, what is worse, it should not come as a surprise if B ends up being better than A on outside data. We propose an evaluation methodology for sequence labeling tasks grounded on error analysis that provides both quantitative and qualitative information on where systems must be improved and predicts how models will perform on a different distribution. The key is to create test sets that, contrary to common practice, do not rely on gathering large amounts of real-world in-distribution scraped data, but consists in handcrafting a small set of linguistically motivated examples that exhaustively cover the range of span attributes (such as shape, length, casing, sentence position, etc.) a system may encounter in the wild. We demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Linguistics, Language Diversity, and Identity
