Automating Coreference: The Role of Annotated Training Data
Lynette Hirschman, Patricia Robinson, John Burger, Marc Vilain

TL;DR
This study investigates interannotator agreement in coreference annotation, showing that clarifying guidelines and separating tasks can significantly improve agreement, thus advancing the development of automated coreference systems.
Contribution
The paper introduces a refined annotation process and a novel separation of coreference tasks that enhances interannotator agreement, facilitating better training data for automated coreference resolution.
Findings
Interannotator agreement improved from low 80s to low 90s with new methods
Most annotation disagreements were due to errors, not coreference ambiguity
Clarified guidelines and task separation are promising for future research
Abstract
We report here on a study of interannotator agreement in the coreference task as defined by the Message Understanding Conference (MUC-6 and MUC-7). Based on feedback from annotators, we clarified and simplified the annotation specification. We then performed an analysis of disagreement among several annotators, concluding that only 16% of the disagreements represented genuine disagreement about coreference; the remainder of the cases were mostly typographical errors or omissions, easily reconciled. Initially, we measured interannotator agreement in the low 80s for precision and recall. To try to improve upon this, we ran several experiments. In our final experiment, we separated the tagging of candidate noun phrases from the linking of actual coreferring expressions. This method shows promise - interannotator agreement climbed to the low 90s - but it needs more extensive validation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Topic Modeling
