TARAZ: Persian Short-Answer Question Benchmark for Cultural Evaluation of Language Models

Reihaneh Iranmanesh; Saeedeh Davoudi; Pasha Abrishamchian; Ophir Frieder; Nazli Goharian

arXiv:2602.22827·cs.CL·March 17, 2026

TARAZ: Persian Short-Answer Question Benchmark for Cultural Evaluation of Language Models

Reihaneh Iranmanesh, Saeedeh Davoudi, Pasha Abrishamchian, Ophir Frieder, Nazli Goharian

PDF

Open Access

TL;DR

This paper introduces a Persian-specific short-answer benchmark for evaluating the cultural understanding of language models, utilizing a hybrid semantic similarity approach to improve scoring accuracy over traditional exact-match methods.

Contribution

It presents the first standardized Persian cultural evaluation benchmark with a hybrid scoring method that captures semantic nuance and morphological complexity.

Findings

01

Hybrid evaluation improves scoring consistency by +10 over exact-match baselines.

02

Semantic similarity metric aligns better with human judgments.

03

Framework is publicly released for future research and benchmarking.

Abstract

This paper presents a comprehensive evaluation framework for assessing the cultural competence of large language models (LLMs) in Persian. Existing Persian cultural benchmarks rely predominantly on multiple-choice formats and English-centric metrics that fail to capture Persian's morphological complexity and semantic nuance. Our framework introduces a Persian-specific short-answer evaluation that combines rule-based morphological normalization with a hybrid syntactic and semantic similarity module, enabling robust soft-match scoring beyond exact string overlap. Through systematic evaluation of 15 state-of-the-art open- and closed-source models across three culturally grounded Persian datasets, we demonstrate that our hybrid evaluation improves scoring consistency by +10 compared to exact-match baselines by capturing meaning that surface-level methods cannot detect. Our human evaluation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods