Exploring Causal Effect of Social Bias on Faithfulness Hallucinations in Large Language Models

Zhenliang Zhang; Junzhe Zhang; Xinyu Hu; HuiXuan Zhang; Xiaojun Wan

arXiv:2508.07753·cs.CL·August 12, 2025

Exploring Causal Effect of Social Bias on Faithfulness Hallucinations in Large Language Models

Zhenliang Zhang, Junzhe Zhang, Xinyu Hu, HuiXuan Zhang, Xiaojun Wan

PDF

Open Access

TL;DR

This paper investigates how social biases causally influence faithfulness hallucinations in large language models, using causal modeling and a new bias dataset to reveal significant bias effects on hallucination generation.

Contribution

It introduces a causal framework with the Bias Intervention Dataset to measure social bias effects on hallucinations in LLMs, advancing understanding of bias-related hallucinations.

Findings

01

Bias significantly causes faithfulness hallucinations in LLMs

02

Different social biases have varying effects on hallucination direction

03

Bias interventions can reduce unfairness hallucinations

Abstract

Large language models (LLMs) have achieved remarkable success in various tasks, yet they remain vulnerable to faithfulness hallucinations, where the output does not align with the input. In this study, we investigate whether social bias contributes to these hallucinations, a causal relationship that has not been explored. A key challenge is controlling confounders within the context, which complicates the isolation of causality between bias states and hallucinations. To address this, we utilize the Structural Causal Model (SCM) to establish and validate the causality and design bias interventions to control confounders. In addition, we develop the Bias Intervention Dataset (BID), which includes various social biases, enabling precise measurement of causal effects. Experiments on mainstream LLMs reveal that biases are significant causes of faithfulness hallucinations, and the effect of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Mental Health via Writing · Adversarial Robustness in Machine Learning