HeQ: a Large and Diverse Hebrew Reading Comprehension Benchmark
Amir DN Cohen, Hilla Merhav, Yoav Goldberg, Reut Tsarfaty

TL;DR
This paper introduces HeQ, a large Hebrew reading comprehension benchmark that addresses linguistic challenges with new guidelines and metrics, revealing insights into model performance on semantic tasks.
Contribution
It presents the first diverse Hebrew MRC dataset with tailored annotation guidelines and evaluation metrics for morphologically rich language understanding.
Findings
Standard metrics are inadequate for Hebrew MRC evaluation.
Models perform poorly on semantic tasks despite morpho-syntactic proficiency.
HeQ fosters progress in Hebrew NLU and MRL research.
Abstract
Current benchmarks for Hebrew Natural Language Processing (NLP) focus mainly on morpho-syntactic tasks, neglecting the semantic dimension of language understanding. To bridge this gap, we set out to deliver a Hebrew Machine Reading Comprehension (MRC) dataset, where MRC is to be realized as extractive Question Answering. The morphologically rich nature of Hebrew poses a challenge to this endeavor: the indeterminacy and non-transparency of span boundaries in morphologically complex forms lead to annotation inconsistencies, disagreements, and flaws in standard evaluation metrics. To remedy this, we devise a novel set of guidelines, a controlled crowdsourcing protocol, and revised evaluation metrics that are suitable for the morphologically rich nature of the language. Our resulting benchmark, HeQ (Hebrew QA), features 30,147 diverse question-answer pairs derived from both Hebrew…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
