TMR: Evaluating NER Recall on Tough Mentions
Jingxuan Tu, Constantine Lignos

TL;DR
This paper introduces the Tough Mentions Recall (TMR) metrics to evaluate NER models on challenging mention subsets, revealing nuanced performance differences across languages and models.
Contribution
The paper proposes TMR metrics to assess NER recall on difficult mentions, providing deeper insights beyond traditional evaluation methods.
Findings
TMR metrics differentiate model performance on tough mentions.
BERT and Flair show subtle differences in English NER.
Current models perform weakly on Spanish tough mentions.
Abstract
We propose the Tough Mentions Recall (TMR) metrics to supplement traditional named entity recognition (NER) evaluation by examining recall on specific subsets of "tough" mentions: unseen mentions, those whose tokens or token/type combination were not observed in training, and type-confusable mentions, token sequences with multiple entity types in the test data. We demonstrate the usefulness of these metrics by evaluating corpora of English, Spanish, and Dutch using five recent neural architectures. We identify subtle differences between the performance of BERT and Flair on two English NER corpora and identify a weak spot in the performance of current models in Spanish. We conclude that the TMR metrics enable differentiation between otherwise similar-scoring systems and identification of patterns in performance that would go unnoticed from overall precision, recall, and F1.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsLinear Layer · Residual Connection · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam · Weight Decay · WordPiece · Dense Connections · Softmax · Layer Normalization · Dropout
