GhazalBench: Usage-Grounded Evaluation of LLMs on Persian Ghazals

Ghazal Kalhor; Yadollah Yaghoobzadeh

arXiv:2603.09979·cs.CL·March 12, 2026

GhazalBench: Usage-Grounded Evaluation of LLMs on Persian Ghazals

Ghazal Kalhor, Yadollah Yaghoobzadeh

PDF

Open Access

TL;DR

GhazalBench is a new benchmark designed to evaluate large language models' ability to understand, paraphrase, and recall Persian ghazals, revealing strengths in meaning comprehension but challenges in exact verse recall, especially compared to English sonnets.

Contribution

Introduces GhazalBench, a comprehensive evaluation framework for LLMs on Persian ghazals, emphasizing culturally grounded understanding and form-based recall.

Findings

01

Models understand poetic meaning well.

02

Models struggle with exact verse recall.

03

Recognition tasks improve recall performance.

Abstract

Persian poetry plays an active role in Iranian cultural practice, where verses by canonical poets such as Hafez are frequently quoted, paraphrased, or completed from partial cues. Supporting such interactions requires language models to engage not only with poetic meaning but also with culturally entrenched surface form. We introduce GhazalBench, a benchmark for evaluating how large language models (LLMs) interact with Persian ghazals under usage-grounded conditions. GhazalBench assesses two complementary abilities: producing faithful prose paraphrases of couplets and accessing canonical verses under varying semantic and formal cues. Across several proprietary and open-weight multilingual LLMs, we observe a consistent dissociation: models generally capture poetic meaning but struggle with exact verse recall in completion-based settings, while recognition-based tasks substantially reduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Topic Modeling · Sentiment Analysis and Opinion Mining