Interpretable Coreference Resolution Evaluation Using Explicit Semantics

Bruno Gatti; Giuliano Martinelli; Roberto Navigli

arXiv:2605.10627·cs.CL·May 12, 2026

Interpretable Coreference Resolution Evaluation Using Explicit Semantics

Bruno Gatti, Giuliano Martinelli, Roberto Navigli

PDF

TL;DR

This paper introduces a semantically-enhanced evaluation framework for coreference resolution that provides detailed diagnostic insights by overlaying semantic labels onto coreference outputs, revealing system weaknesses and guiding targeted improvements.

Contribution

The authors propose a novel evaluation method that incorporates semantic labels into coreference resolution assessment, enabling more interpretable and diagnostic analysis.

Findings

01

Uncovered systematic weaknesses in coreference models using the new framework.

02

Demonstrated that diagnostics can guide targeted data augmentation strategies.

03

Achieved measurable out-of-domain improvements through semantic-aware evaluation.

Abstract

Coreference resolution is typically evaluated using aggregate statistical metrics such as CoNLL-F1, which measure structural overlap between predicted and gold clusters. While widely used, these metrics offer limited diagnostic insights, penalizing errors without revealing whether a system struggles with specific semantic categories, such as people, locations, or events, and making it difficult to interpret model capabilities or derive actionable improvements. We address this gap by introducing a semantically-enhanced evaluation framework for coreference resolution. Our approach overlays Concept and Named Entity Recognition (CNER) onto coreference outputs, assigning semantic labels to nominal mentions and propagating them to entire coreference clusters. This enables the computation of typed scores aimed at evaluating mention extraction and linking capabilities stratified by semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.