BengaliMoralBench: A Benchmark for Auditing Moral Reasoning in Large Language Models within Bengali Language and Culture
Shahriyar Zaman Ridoy, Azmine Toushik Wasi, Koushik Ahamed Tonmoy, Taki Hasan Rafi, Dong-Kyu Chae

TL;DR
BengaliMoralBench is a comprehensive ethics benchmark for evaluating large language models' moral reasoning within Bengali cultural and linguistic contexts, addressing the lack of culturally grounded assessment tools.
Contribution
It introduces a large-scale, culturally nuanced ethics benchmark for Bengali, covering five moral domains and evaluating models across multiple ethical perspectives.
Findings
Models show significant variation in moral reasoning performance.
Current LLMs exhibit weaknesses in cultural grounding and fairness.
Benchmark reveals critical limitations in non-Western moral understanding.
Abstract
As multilingual Large Language Models (LLMs) gain traction across South Asia, their alignment with local ethical norms, particularly for Bengali, spoken by over 285 million people worldwide and among the most widely spoken languages globally, remains underexplored. Existing ethics benchmarks are predominantly English-centric and shaped by Western moral frameworks, overlooking cultural nuances vital for real-world deployment. To address this gap, we introduce BengaliMoralBench, a large-scale ethics benchmark designed for Bengali language and sociocultural contexts. Our benchmark spans five moral domains: (1) Daily Activities, (2) Habits, (3) Parenting, (4) Family Relationships, and (5) Religious Activities, each subdivided into ten culturally grounded categories, totaling 50 subtopics. Each scenario is annotated through native-speaker consensus under three ethical lenses: virtue ethics,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
