How Deep Is Representational Bias in LLMs? The Cases of Caste and Religion

Agrima Seth; Monojit Choudhary; Sunayana Sitaram; Kentaro Toyama; Aditya Vashistha; Kalika Bali

arXiv:2508.03712·cs.CL·August 7, 2025

How Deep Is Representational Bias in LLMs? The Cases of Caste and Religion

Agrima Seth, Monojit Choudhary, Sunayana Sitaram, Kentaro Toyama, Aditya Vashistha, Kalika Bali

PDF

TL;DR

This study systematically audits GPT-4 Turbo to measure the depth and persistence of representational biases related to caste and religion in India, revealing overrepresentation of dominant groups despite diversity prompts.

Contribution

It provides a novel, large-scale analysis of caste and religious biases in LLMs, extending bias research beyond common identities like race and gender.

Findings

01

GPT-4 overrepresents dominant groups beyond their population share

02

Repeated prompts have limited effect on reducing biases

03

Biases are more entrenched than training data distribution suggests

Abstract

Representational bias in large language models (LLMs) has predominantly been measured through single-response interactions and has focused on Global North-centric identities like race and gender. We expand on that research by conducting a systematic audit of GPT-4 Turbo to reveal how deeply encoded representational biases are and how they extend to less-explored dimensions of identity. We prompt GPT-4 Turbo to generate over 7,200 stories about significant life events (such as weddings) in India, using prompts designed to encourage diversity to varying extents. Comparing the diversity of religious and caste representation in the outputs against the actual population distribution in India as recorded in census data, we quantify the presence and "stickiness" of representational bias in the LLM for religion and caste. We find that GPT-4 responses consistently overrepresent culturally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.