Explicit vs. Implicit: Investigating Social Bias in Large Language Models through Self-Reflection

Yachao Zhao; Bo Wang; Yan Wang; Dongming Zhao; Ruifang He; Yuexian Hou

arXiv:2501.02295·cs.CL·June 4, 2025

Explicit vs. Implicit: Investigating Social Bias in Large Language Models through Self-Reflection

Yachao Zhao, Bo Wang, Yan Wang, Dongming Zhao, Ruifang He, Yuexian Hou

PDF

Open Access

TL;DR

This paper introduces a social psychology-inspired framework to systematically compare explicit and implicit biases in large language models, revealing significant differences and underlying factors affecting bias manifestation.

Contribution

It presents a novel self-reflection-based evaluation method for implicit bias and uncovers the contrasting behaviors of explicit and implicit biases in LLMs.

Findings

01

Implicit bias is stronger and more persistent than explicit bias.

02

Explicit bias decreases with larger training data and models, while implicit bias increases.

03

Alignment techniques reduce explicit bias but have limited impact on implicit bias.

Abstract

Large Language Models (LLMs) have been shown to exhibit various biases and stereotypes in their generated content. While extensive research has investigated biases in LLMs, prior work has predominantly focused on explicit bias, with minimal attention to implicit bias and the relation between these two forms of bias. This paper presents a systematic framework grounded in social psychology theories to investigate and compare explicit and implicit biases in LLMs. We propose a novel self-reflection-based evaluation framework that operates in two phases: first measuring implicit bias through simulated psychological assessment methods, then evaluating explicit bias by prompting LLMs to analyze their own generated content. Through extensive experiments on advanced LLMs across multiple social dimensions, we demonstrate that LLMs exhibit a substantial inconsistency between explicit and implicit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Topic Modeling · Hate Speech and Cyberbullying Detection