Familiarity: Better Evaluation of Zero-Shot Named Entity Recognition by   Quantifying Label Shifts in Synthetic Training Data

Jonas Golde; Patrick Haller; Max Ploner; Fabio Barth; Nicolaas Jedema,; Alan Akbik

arXiv:2412.10121·cs.CL·March 10, 2025

Familiarity: Better Evaluation of Zero-Shot Named Entity Recognition by Quantifying Label Shifts in Synthetic Training Data

Jonas Golde, Patrick Haller, Max Ploner, Fabio Barth, Nicolaas Jedema,, Alan Akbik

PDF

Open Access 1 Repo

TL;DR

This paper introduces Familiarity, a metric that quantifies label shift by measuring semantic similarity and frequency differences between training and evaluation datasets, improving zero-shot NER evaluation.

Contribution

The paper proposes a novel metric called Familiarity to better evaluate zero-shot NER by accounting for label similarity and distribution differences.

Findings

01

Familiarity helps contextualize zero-shot NER scores.

02

It enables generation of diverse transfer difficulty evaluation setups.

03

Addresses overestimation of model capabilities due to dataset overlaps.

Abstract

Zero-shot named entity recognition (NER) is the task of detecting named entities of specific types (such as 'Person' or 'Medicine') without any training examples. Current research increasingly relies on large synthetic datasets, automatically generated to cover tens of thousands of distinct entity types, to train zero-shot NER models. However, in this paper, we find that these synthetic datasets often contain entity types that are semantically highly similar to (or even the same as) those in standard evaluation benchmarks. Because of this overlap, we argue that reported F1 scores for zero-shot NER overestimate the true capabilities of these approaches. Further, we argue that current evaluation setups provide an incomplete picture of zero-shot abilities since they do not quantify the label shift (i.e., the similarity of labels) between training and evaluation datasets. To address these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

whoisjones/familarity
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques