TL;DR
This paper evaluates the generalizability of cross-document event coreference resolution systems across multiple diverse corpora, revealing that current neural models often overfit to specific datasets and highlighting the need for multi-corpus evaluation.
Contribution
It introduces a uniform evaluation setup across three CDCR corpora, compares feature-based and neural systems, and provides insights into their generalizability and overfitting issues.
Findings
Feature-based system is more consistent across corpora.
Neural system performance varies greatly between datasets.
Overfitting to ECB+ corpus structure is observed.
Abstract
Cross-document event coreference resolution (CDCR) is an NLP task in which mentions of events need to be identified and clustered throughout a collection of documents. CDCR aims to benefit downstream multi-document applications, but despite recent progress on corpora and system development, downstream improvements from applying CDCR have not been shown yet. We make the observation that every CDCR system to date was developed, trained, and tested only on a single respective corpus. This raises strong concerns on their generalizability -- a must-have for downstream applications where the magnitude of domains or event mentions is likely to exceed those found in a curated corpus. To investigate this assumption, we define a uniform evaluation setup involving three CDCR corpora: ECB+, the Gun Violence Corpus and the Football Coreference Corpus (which we reannotate on token level to make our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
