Interpretable Coreference Resolution Evaluation Using Explicit Semantics
Bruno Gatti, Giuliano Martinelli, Roberto Navigli

TL;DR
This paper introduces a semantically-enhanced evaluation framework for coreference resolution that provides detailed diagnostic insights by overlaying semantic labels onto coreference outputs, revealing system weaknesses and guiding targeted improvements.
Contribution
The authors propose a novel evaluation method that incorporates semantic labels into coreference resolution assessment, enabling more interpretable and diagnostic analysis.
Findings
Uncovered systematic weaknesses in coreference models using the new framework.
Demonstrated that diagnostics can guide targeted data augmentation strategies.
Achieved measurable out-of-domain improvements through semantic-aware evaluation.
Abstract
Coreference resolution is typically evaluated using aggregate statistical metrics such as CoNLL-F1, which measure structural overlap between predicted and gold clusters. While widely used, these metrics offer limited diagnostic insights, penalizing errors without revealing whether a system struggles with specific semantic categories, such as people, locations, or events, and making it difficult to interpret model capabilities or derive actionable improvements. We address this gap by introducing a semantically-enhanced evaluation framework for coreference resolution. Our approach overlays Concept and Named Entity Recognition (CNER) onto coreference outputs, assigning semantic labels to nominal mentions and propagating them to entire coreference clusters. This enables the computation of typed scores aimed at evaluating mention extraction and linking capabilities stratified by semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
