Automating Coreference: The Role of Annotated Training Data

Lynette Hirschman; Patricia Robinson; John Burger; Marc Vilain

arXiv:cmp-lg/9803001·cmp-lg·May 23, 2007·69 cites

Automating Coreference: The Role of Annotated Training Data

Lynette Hirschman, Patricia Robinson, John Burger, Marc Vilain

PDF

Open Access

TL;DR

This study investigates interannotator agreement in coreference annotation, showing that clarifying guidelines and separating tasks can significantly improve agreement, thus advancing the development of automated coreference systems.

Contribution

The paper introduces a refined annotation process and a novel separation of coreference tasks that enhances interannotator agreement, facilitating better training data for automated coreference resolution.

Findings

01

Interannotator agreement improved from low 80s to low 90s with new methods

02

Most annotation disagreements were due to errors, not coreference ambiguity

03

Clarified guidelines and task separation are promising for future research

Abstract

We report here on a study of interannotator agreement in the coreference task as defined by the Message Understanding Conference (MUC-6 and MUC-7). Based on feedback from annotators, we clarified and simplified the annotation specification. We then performed an analysis of disagreement among several annotators, concluding that only 16% of the disagreements represented genuine disagreement about coreference; the remainder of the cases were mostly typographical errors or omissions, easily reconciled. Initially, we measured interannotator agreement in the low 80s for precision and recall. To try to improve upon this, we ran several experiments. In our final experiment, we separated the tagging of candidate noun phrases from the linking of actual coreferring expressions. This method shows promise - interannotator agreement climbed to the low 90s - but it needs more extensive validation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Topic Modeling