PakBBQ: A Culturally Adapted Bias Benchmark for QA

Abdullah Hashmat; Muhammad Arham Mirza; Agha Ali Raza

arXiv:2508.10186·cs.CL·September 30, 2025

PakBBQ: A Culturally Adapted Bias Benchmark for QA

Abdullah Hashmat, Muhammad Arham Mirza, Agha Ali Raza

PDF

1 Video

TL;DR

This paper introduces PakBBQ, a culturally adapted bias benchmark for QA in Urdu and English, revealing bias patterns and mitigation strategies in multilingual LLMs within Pakistani contexts.

Contribution

It presents the first culturally and regionally tailored bias benchmark for QA, enabling bias evaluation and mitigation in low-resource language settings.

Findings

01

Disambiguation improves accuracy by 12%.

02

Urdu responses show stronger bias mitigation than English.

03

Negative framing reduces stereotypical responses.

Abstract

With the widespread adoption of Large Language Models (LLMs) across various applications, it is empirical to ensure their fairness across all user communities. However, most LLMs are trained and evaluated on Western centric data, with little attention paid to low-resource languages and regional contexts. To address this gap, we introduce PakBBQ, a culturally and regionally adapted extension of the original Bias Benchmark for Question Answering (BBQ) dataset. PakBBQ comprises over 214 templates, 17180 QA pairs across 8 categories in both English and Urdu, covering eight bias dimensions including age, disability, appearance, gender, socio-economic status, religious, regional affiliation, and language formality that are relevant in Pakistan. We evaluate multiple multilingual LLMs under both ambiguous and explicitly disambiguated contexts, as well as negative versus non negative question…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

PakBBQ: A Culturally Adapted Bias Benchmark for QA· underline