Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective

Yuchen Wen; Keping Bi; Wei Chen; Jiafeng Guo; Xueqi Cheng

arXiv:2406.14023·cs.CL·July 14, 2025

Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective

Yuchen Wen, Keping Bi, Wei Chen, Jiafeng Guo, Xueqi Cheng

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces psychometrically inspired attack methods and benchmarks to evaluate and reveal implicit biases in large language models, highlighting ethical risks and promoting accountability.

Contribution

It proposes three novel attack approaches and two comprehensive benchmarks for assessing implicit bias in LLMs from a psychometric perspective.

Findings

01

Our methods effectively elicit biases more than baseline approaches.

02

Popular LLMs exhibit significant implicit biases across multiple types.

03

Benchmarks enable systematic comparison of bias levels in different models.

Abstract

As large language models (LLMs) become an important way of information access, there have been increasing concerns that LLMs may intensify the spread of unethical content, including implicit bias that hurts certain populations without explicit harmful words. In this paper, we conduct a rigorous evaluation of LLMs' implicit bias towards certain demographics by attacking them from a psychometric perspective to elicit agreements to biased viewpoints. Inspired by psychometric principles in cognitive and social psychology, we propose three attack approaches, i.e., Disguise, Deception, and Teaching. Incorporating the corresponding attack instructions, we built two benchmarks: (1) a bilingual dataset with biased statements covering four bias types (2.7K instances) for extensive comparative analysis, and (2) BUMBLE, a larger benchmark spanning nine common bias types (12.7K instances) for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wen112358/implicitbiaspsychometricevaluation
noneOfficial

Videos

Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective· underline

Taxonomy

TopicsText Readability and Simplification · Topic Modeling · Natural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Linear Layer · Cosine Annealing · Multi-Head Attention · Residual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Attention Dropout