When Words Don't Mean What They Say: Figurative Understanding in Bengali Idioms
Adib Sakhawat, Shamim Ara Parveen, Md Ruhul Amin, Shamim Al Mahmud, Md Saiful Islam, Tahera Khatun

TL;DR
This paper introduces a comprehensive Bengali idiom dataset and benchmark to evaluate and improve large language models' understanding of figurative language in low-resource, culturally-rich contexts, revealing significant performance gaps.
Contribution
The creation of a large-scale, culturally-grounded Bengali idiom dataset with detailed annotations and a benchmark for evaluating LLMs' figurative language understanding.
Findings
No model exceeds 50% accuracy on the benchmark.
Human performance is 83.4%, highlighting the models' limitations.
The dataset and benchmark reveal significant gaps in LLMs' cultural and figurative understanding.
Abstract
Figurative language understanding remains a significant challenge for Large Language Models (LLMs), especially for low-resource languages. To address this, we introduce a new idiom dataset, a large-scale, culturally-grounded corpus of 10,361 Bengali idioms. Each idiom is annotated under a comprehensive 19-field schema, established and refined through a deliberative expert consensus process, that captures its semantic, syntactic, cultural, and religious dimensions, providing a rich, structured resource for computational linguistics. To establish a robust benchmark for Bangla figurative language understanding, we evaluate 30 state-of-the-art multilingual and instruction-tuned LLMs on the task of inferring figurative meaning. Our results reveal a critical performance gap, with no model surpassing 50% accuracy, a stark contrast to significantly higher human performance (83.4%). This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage, Metaphor, and Cognition · Sentiment Analysis and Opinion Mining · Topic Modeling
