Dual-Metric Evaluation of Social Bias in Large Language Models: Evidence from an Underrepresented Nepali Cultural Context

Ashish Pandey; Tek Raj Chhetri

arXiv:2603.07792·cs.CL·March 10, 2026

Dual-Metric Evaluation of Social Bias in Large Language Models: Evidence from an Underrepresented Nepali Cultural Context

Ashish Pandey, Tek Raj Chhetri

PDF

Open Access

TL;DR

This paper systematically evaluates social biases in seven large language models within the Nepali cultural context, revealing measurable explicit and implicit biases and their dependence on decoding parameters, emphasizing the need for culturally grounded bias mitigation.

Contribution

It introduces the Dual-Metric Bias Assessment (DMBA) framework for evaluating biases in LLMs in underrepresented cultural settings, with comprehensive analysis of bias behaviors across models and parameters.

Findings

01

Models exhibit measurable explicit agreement bias (0.36-0.43)

02

Implicit completion bias rate is high (0.740-0.755) and varies with temperature

03

Implicit bias peaks at moderate stochasticity (T=0.3) and is stable across top-p settings

Abstract

Large language models (LLMs) increasingly influence global digital ecosystems, yet their potential to perpetuate social and cultural biases remains poorly understood in underrepresented contexts. This study presents a systematic analysis of representational biases in seven state-of-the-art LLMs: GPT-4o-mini, Claude-3-Sonnet, Claude-4-Sonnet, Gemini-2.0-Flash, Gemini-2.0-Lite, Llama-3-70B, and Mistral-Nemo in the Nepali cultural context. Using Croissant-compliant dataset of 2400+ stereotypical and anti-stereotypical sentence pairs on gender roles across social domains, we implement an evaluation framework, Dual-Metric Bias Assessment (DMBA), combining two metrics: (1) agreement with biased statements and (2) stereotypical completion tendencies. Results show models exhibit measurable explicit agreement bias, with mean bias agreement ranging from 0.36 to 0.43 across decoding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Language and cultural evolution · Topic Modeling