AmharicStoryQA: A Multicultural Story Question Answering Benchmark in Amharic

Israel Abebe Azime; Abenezer Kebede Angamo; Hana Mekonen Tamiru; Dagnachew Mekonnen Marilign; Philipp Slusallek; Seid Muhie Yimam; Dietrich Klakow

arXiv:2602.02774·cs.CL·February 4, 2026

AmharicStoryQA: A Multicultural Story Question Answering Benchmark in Amharic

Israel Abebe Azime, Abenezer Kebede Angamo, Hana Mekonen Tamiru, Dagnachew Mekonnen Marilign, Philipp Slusallek, Seid Muhie Yimam, Dietrich Klakow

PDF

Open Access

TL;DR

AmharicStoryQA is a culturally diverse benchmark for evaluating Amharic language models' narrative understanding, revealing regional differences and highlighting the importance of culturally grounded assessments in low-resource languages.

Contribution

This work introduces AmharicStoryQA, a novel culturally diverse story question answering benchmark for Amharic, addressing the gap in evaluating regional and cultural understanding in language models.

Findings

01

Existing LLMs show significant narrative understanding gaps.

02

Regional differences significantly affect evaluation outcomes.

03

Supervised fine-tuning yields uneven improvements across regions.

Abstract

With the growing emphasis on multilingual and cultural evaluation benchmarks for large language models, language and culture are often treated as synonymous, and performance is commonly used as a proxy for a models understanding of a given language. In this work, we argue that such evaluations overlook meaningful cultural variation that exists within a single language. We address this gap by focusing on narratives from different regions of Ethiopia and demonstrate that, despite shared linguistic characteristics, region-specific and domain-specific content substantially influences language evaluation outcomes. To this end, we introduce \textbf{\textit{AmharicStoryQA}}, a long-sequence story question answering benchmark grounded in culturally diverse narratives from Amharic-speaking regions. Using this benchmark, we reveal a significant narrative understanding gap in existing LLMs,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications