Mind vs. Mouth: On Measuring Re-judge Inconsistency of Social Bias in Large Language Models
Yachao Zhao, Bo Wang, Dongming Zhao, Kun Huang, Yan Wang, Ruifang He,, Yuexian Hou

TL;DR
This paper investigates the phenomenon of re-judge inconsistency in large language models, revealing parallels with human implicit and explicit social biases, and offers insights into their cognitive-like behaviors.
Contribution
It introduces a two-stage approach to measure social bias in LLMs and uncovers a stable re-judge inconsistency phenomenon analogous to human cognitive bias.
Findings
Re-judge inconsistency is highly stable across models.
LLMs exhibit parallel behaviors to human implicit and explicit biases.
Psychological theories can enhance understanding of LLM social biases.
Abstract
Recent researches indicate that Pre-trained Large Language Models (LLMs) possess cognitive constructs similar to those observed in humans, prompting researchers to investigate the cognitive aspects of LLMs. This paper focuses on explicit and implicit social bias, a distinctive two-level cognitive construct in psychology. It posits that individuals' explicit social bias, which is their conscious expression of bias in the statements, may differ from their implicit social bias, which represents their unconscious bias. We propose a two-stage approach and discover a parallel phenomenon in LLMs known as "re-judge inconsistency" in social bias. In the initial stage, the LLM is tasked with automatically completing statements, potentially incorporating implicit social bias. However, in the subsequent stage, the same LLM re-judges the biased statement generated by itself but contradicts it. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Artificial Intelligence in Healthcare and Education
MethodsAttention Is All You Need · Linear Layer · Dropout · Byte Pair Encoding · Adam · Position-Wise Feed-Forward Layer · Multi-Head Attention · Layer Normalization · Absolute Position Encodings · Residual Connection
