Realistic Evaluation Principles for Cross-document Coreference   Resolution

Arie Cattan; Alon Eirew; Gabriel Stanovsky; Mandar Joshi; Ido Dagan

arXiv:2106.04192·cs.CL·June 9, 2021

Realistic Evaluation Principles for Cross-document Coreference Resolution

Arie Cattan, Alon Eirew, Gabriel Stanovsky, Mandar Joshi, Ido Dagan

PDF

Open Access 1 Repo

TL;DR

This paper critiques current evaluation practices for cross-document coreference resolution, proposing more realistic principles that significantly lower reported performance scores and better reflect real-world challenges.

Contribution

It introduces two principled evaluation guidelines—using predicted mentions and avoiding reliance on synthetic topic structures—to improve the realism of model assessments.

Findings

01

Evaluation scores drop by 33 F1 points under the new principles.

02

Models are forced to handle lexical ambiguity more effectively.

03

Current practices overestimate model performance due to lenient evaluation.

Abstract

We point out that common evaluation practices for cross-document coreference resolution have been unrealistically permissive in their assumed settings, yielding inflated results. We propose addressing this issue via two evaluation methodology principles. First, as in other tasks, models should be evaluated on predicted mentions rather than on gold mentions. Doing this raises a subtle issue regarding singleton coreference clusters, which we address by decoupling the evaluation of mention detection from that of coreference linking. Second, we argue that models should not exploit the synthetic topic structure of the standard ECB+ dataset, forcing models to confront the lexical ambiguity challenge, as intended by the dataset creators. We demonstrate empirically the drastic impact of our more realistic evaluation principles on a competitive model, yielding a score which is 33 F1 lower…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ariecattan/coref
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies