Anatomy of OntoGUM--Adapting GUM to the OntoNotes Scheme to Evaluate Robustness of SOTA Coreference Algorithms
Yilun Zhu, Sameer Pradhan, Amir Zeldes

TL;DR
This paper adapts the GUM corpus to the OntoNotes scheme to evaluate the robustness and generalizability of state-of-the-art coreference resolution models across diverse genres, revealing significant performance degradation.
Contribution
It details the deterministic mapping process from GUM to OntoNotes and provides an out-of-domain evaluation across 12 genres highlighting model limitations.
Findings
15-20% performance degradation across genres
Existing models lack robustness to domain shifts
Highlights need for more generalizable coreference systems
Abstract
SOTA coreference resolution produces increasingly impressive scores on the OntoNotes benchmark. However lack of comparable data following the same scheme for more genres makes it difficult to evaluate generalizability to open domain data. Zhu et al. (2021) introduced the creation of the OntoGUM corpus for evaluating geralizability of the latest neural LM-based end-to-end systems. This paper covers details of the mapping process which is a set of deterministic rules applied to the rich syntactic and discourse annotations manually annotated in the GUM corpus. Out-of-domain evaluation across 12 genres shows nearly 15-20% degradation for both deterministic and deep learning systems, indicating a lack of generalizability or covert overfitting in existing coreference resolution models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
