Generalizing Cross-Document Event Coreference Resolution Across Multiple   Corpora

Michael Bugert; Nils Reimers; Iryna Gurevych

arXiv:2011.12249·cs.CL·June 14, 2021

Generalizing Cross-Document Event Coreference Resolution Across Multiple Corpora

Michael Bugert, Nils Reimers, Iryna Gurevych

PDF

1 Repo

TL;DR

This paper evaluates the generalizability of cross-document event coreference resolution systems across multiple diverse corpora, revealing that current neural models often overfit to specific datasets and highlighting the need for multi-corpus evaluation.

Contribution

It introduces a uniform evaluation setup across three CDCR corpora, compares feature-based and neural systems, and provides insights into their generalizability and overfitting issues.

Findings

01

Feature-based system is more consistent across corpora.

02

Neural system performance varies greatly between datasets.

03

Overfitting to ECB+ corpus structure is observed.

Abstract

Cross-document event coreference resolution (CDCR) is an NLP task in which mentions of events need to be identified and clustered throughout a collection of documents. CDCR aims to benefit downstream multi-document applications, but despite recent progress on corpora and system development, downstream improvements from applying CDCR have not been shown yet. We make the observation that every CDCR system to date was developed, trained, and tested only on a single respective corpus. This raises strong concerns on their generalizability -- a must-have for downstream applications where the magnitude of domains or event mentions is likely to exceed those found in a curated corpus. To investigate this assumption, we define a uniform evaluation setup involving three CDCR corpora: ECB+, the Gun Violence Corpus and the Football Coreference Corpus (which we reannotate on token level to make our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

UKPLab/cdcr-beyond-corpus-tailored
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.