Longtonotes: OntoNotes with Longer Coreference Chains

Kumar Shridhar; Nicholas Monath; Raghuveer Thirukovalluru; Alessandro; Stolfo; Manzil Zaheer; Andrew McCallum; Mrinmaya Sachan

arXiv:2210.03650·cs.CL·October 10, 2022

Longtonotes: OntoNotes with Longer Coreference Chains

Kumar Shridhar, Nicholas Monath, Raghuveer Thirukovalluru, Alessandro, Stolfo, Manzil Zaheer, Andrew McCallum, Mrinmaya Sachan

PDF

1 Repo

TL;DR

This paper introduces LongtoNotes, a new, longer coreference resolution corpus derived from Ontonotes, enabling better evaluation and understanding of model performance on lengthy documents across multiple genres.

Contribution

The work provides a manually-curated, longer document corpus for coreference resolution, addressing limitations of previous datasets and facilitating research on long-document modeling.

Findings

01

State-of-the-art models show performance drops on longer documents.

02

Model architecture and hyperparameters significantly affect performance and efficiency.

03

The new corpus reveals specific challenges in long-document coreference resolution.

Abstract

Ontonotes has served as the most important benchmark for coreference resolution. However, for ease of annotation, several long documents in Ontonotes were split into smaller parts. In this work, we build a corpus of coreference-annotated documents of significantly longer length than what is currently available. We do so by providing an accurate, manually-curated, merging of annotations from documents that were split into multiple parts in the original Ontonotes annotation process. The resulting corpus, which we call LongtoNotes contains documents in multiple genres of the English language with varying lengths, the longest of which are up to 8x the length of documents in Ontonotes, and 2x those in Litbank. We evaluate state-of-the-art neural coreference systems on this new corpus, analyze the relationships between model architectures/hyperparameters and document length on performance and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kumar-shridhar/longtonotes
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.