PerCul: A Story-Driven Cultural Evaluation of LLMs in Persian

Erfan Moosavi Monazzah; Vahid Rahimzadeh; Yadollah Yaghoobzadeh; Azadeh Shakery; Mohammad Taher Pilehvar

arXiv:2502.07459·cs.CL·July 21, 2025

PerCul: A Story-Driven Cultural Evaluation of LLMs in Persian

Erfan Moosavi Monazzah, Vahid Rahimzadeh, Yadollah Yaghoobzadeh, Azadeh Shakery, Mohammad Taher Pilehvar

PDF

Open Access 1 Datasets 1 Video

TL;DR

PerCul introduces a culturally nuanced dataset to evaluate Persian language models' sensitivity to Persian culture, revealing significant gaps in their cultural understanding compared to layperson benchmarks.

Contribution

This paper presents PerCul, a novel story-based dataset curated with native Persian input to assess LLMs' cultural competence in Persian, addressing a critical evaluation gap.

Findings

01

Best closed source model lags 11.3% behind layperson baseline.

02

Open-weight models show a 21.3% gap from layperson baseline.

03

PerCul provides a new benchmark for cross-cultural NLP evaluation.

Abstract

Large language models predominantly reflect Western cultures, largely due to the dominance of English-centric training data. This imbalance presents a significant challenge, as LLMs are increasingly used across diverse contexts without adequate evaluation of their cultural competence in non-English languages, including Persian. To address this gap, we introduce PerCul, a carefully constructed dataset designed to assess the sensitivity of LLMs toward Persian culture. PerCul features story-based, multiple-choice questions that capture culturally nuanced scenarios. Unlike existing benchmarks, PerCul is curated with input from native Persian annotators to ensure authenticity and to prevent the use of translation as a shortcut. We evaluate several state-of-the-art multilingual and Persian-specific LLMs, establishing a foundation for future research in cross-cultural NLP evaluation. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

teias-ai/percul
dataset· 22 dl
22 dl

Videos

PerCul: A Story-Driven Cultural Evaluation of LLMs in Persian· underline

Taxonomy

TopicsTranslation Studies and Practices · Wikis in Education and Collaboration · AI in Service Interactions