Benchmarking Bengali Dialectal Bias: A Multi-Stage Framework Integrating RAG-Based Translation and Human-Augmented RLAIF
K. M. Jubair Sami, Dipto Sumit, Ariyan Hossain, Farig Sadeque

TL;DR
This paper introduces a comprehensive framework to evaluate dialectal bias in Bengali language models, combining novel translation quality assessment, a large benchmark dataset, and a bias sensitivity metric, revealing significant performance disparities across dialects.
Contribution
It presents a multi-stage evaluation framework integrating RAG-based translation and human-augmented RLAIF, along with a new bias sensitivity metric and a benchmark dataset for Bengali dialects.
Findings
Significant performance drops in dialectal question-answering accuracy.
Traditional translation metrics are ineffective for dialects; LLM-based evaluation correlates better with human judgment.
Model scale does not consistently reduce dialectal bias.
Abstract
Large language models (LLMs) frequently exhibit performance biases against regional dialects of low-resource languages. However, frameworks to quantify these disparities remain scarce. We propose a two-phase framework to evaluate dialectal bias in LLM question-answering across nine Bengali dialects. First, we translate and gold-label standard Bengali questions into dialectal variants adopting a retrieval-augmented generation (RAG) pipeline to prepare 4,000 question sets. Since traditional translation quality evaluation metrics fail on unstandardized dialects, we evaluate fidelity using an LLM-as-a-judge, which human correlation confirms outperforms legacy metrics. Second, we benchmark 19 LLMs across these gold-labeled sets, running 68,395 RLAIF evaluations validated through multi-judge agreement and human fallback. Our findings reveal severe performance drops linked to linguistic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Linguistic Variation and Morphology · Authorship Attribution and Profiling
