Applying Reliability Metrics to Co-Reference Annotation
Rebecca J. Passonneau

TL;DR
This paper introduces a method to evaluate the reliability of coreference annotation in language corpora using Cohen's Kappa, highlighting its advantages over traditional recall and precision metrics.
Contribution
It presents a novel approach to compute reliability of coreference annotation by adapting information retrieval metrics and demonstrates the superiority of Cohen's Kappa for this purpose.
Findings
Recall and precision can be misleadingly high for reliability assessment.
Cohen's Kappa effectively accounts for chance agreement among annotators.
The proposed method improves reliability evaluation in annotated language corpora.
Abstract
Studies of the contextual and linguistic factors that constrain discourse phenomena such as reference are coming to depend increasingly on annotated language corpora. In preparing the corpora, it is important to evaluate the reliability of the annotation, but methods for doing so have not been readily available. In this report, I present a method for computing reliability of coreference annotation. First I review a method for applying the information retrieval metrics of recall and precision to coreference annotation proposed by Marc Vilain and his collaborators. I show how this method makes it possible to construct contingency tables for computing Cohen's Kappa, a familiar reliability metric. By comparing recall and precision to reliability on the same data sets, I also show that recall and precision can be misleadingly high. Because Kappa factors out chance agreement among coders, it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Multi-Criteria Decision Making · Bayesian Modeling and Causal Inference
