Race, Ethnicity and Their Implication on Bias in Large Language Models

Shiyue Hu; Ruizhe Li; Yanjun Gao

arXiv:2601.12868·cs.CL·January 21, 2026

Race, Ethnicity and Their Implication on Bias in Large Language Models

Shiyue Hu, Ruizhe Li, Yanjun Gao

PDF

Open Access

TL;DR

This study investigates how large language models encode race and ethnicity internally, revealing diverse representations and the partial effectiveness of interventions in reducing bias, highlighting the need for systematic mitigation strategies.

Contribution

It provides a mechanistic analysis of demographic representation in LLMs using interpretability techniques, revealing internal encoding and intervention effects on bias.

Findings

01

Demographic info is distributed across internal units with variation.

02

Some units encode stereotypes from pretraining.

03

Interventions reduce bias but leave residual effects.

Abstract

Large language models (LLMs) increasingly operate in high-stakes settings including healthcare and medicine, where demographic attributes such as race and ethnicity may be explicitly stated or implicitly inferred from text. However, existing studies primarily document outcome-level disparities, offering limited insight into internal mechanisms underlying these effects. We present a mechanistic study of how race and ethnicity are represented and operationalized within LLMs. Using two publicly available datasets spanning toxicity-related generation and clinical narrative understanding tasks, we analyze three open-source models with a reproducible interpretability pipeline combining probing, neuron-level attribution, and targeted intervention. We find that demographic information is distributed across internal units with substantial cross-model variation. Although some units encode…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Machine Learning in Healthcare