Beyond Word Boundaries: A Hebrew Coreference Benchmark and an Evaluation Protocol for Morphologically Complex Text

Refael Shaked Greenfeld; Reut Tsarfaty

arXiv:2604.17108·cs.CL·April 21, 2026

Beyond Word Boundaries: A Hebrew Coreference Benchmark and an Evaluation Protocol for Morphologically Complex Text

Refael Shaked Greenfeld, Reut Tsarfaty

PDF

TL;DR

This paper introduces KibutzR, a Hebrew coreference resolution dataset and evaluation protocol tailored for morphologically complex languages, revealing performance gaps in current models and highlighting the need for segmentation-aware approaches.

Contribution

It provides the first comprehensive Hebrew CR dataset with multi-level mention annotations and a new evaluation protocol addressing boundary discrepancies in MRLs.

Findings

01

LLMs perform worse on Hebrew than English

02

Performance drops on raw unsegmented Hebrew text

03

Smaller encoders outperform larger decoder models in Hebrew

Abstract

Coreference Resolution (CR) is a fundamental NLP task critical for long-form tasks as information extraction, summarization, and many business applications. However, CR methods originally designed for English struggle with Morphologically Rich Languages (MRLs), where mention boundaries do not necessarily align with word boundaries, and a single token may consist of multiple anaphors. CR modeling and evaluation protocols standardly assume that, as in English, words and mentions mostly align. However, this assumption breaks down in MRLs, particularly in the context of LLMs' raw-text processing and end-to-end tasks. To assess and address this challenge, we introduce {\em KibutzR}, the first comprehensive CR dataset for Modern Hebrew, an MRL rich with complex words and pronominal clitics. We deliver an annotated dataset that identifies mentions at word, sub-word and multi-word levels, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.