Exploring Causal Effect of Social Bias on Faithfulness Hallucinations in Large Language Models
Zhenliang Zhang, Junzhe Zhang, Xinyu Hu, HuiXuan Zhang, Xiaojun Wan

TL;DR
This paper investigates how social biases causally influence faithfulness hallucinations in large language models, using causal modeling and a new bias dataset to reveal significant bias effects on hallucination generation.
Contribution
It introduces a causal framework with the Bias Intervention Dataset to measure social bias effects on hallucinations in LLMs, advancing understanding of bias-related hallucinations.
Findings
Bias significantly causes faithfulness hallucinations in LLMs
Different social biases have varying effects on hallucination direction
Bias interventions can reduce unfairness hallucinations
Abstract
Large language models (LLMs) have achieved remarkable success in various tasks, yet they remain vulnerable to faithfulness hallucinations, where the output does not align with the input. In this study, we investigate whether social bias contributes to these hallucinations, a causal relationship that has not been explored. A key challenge is controlling confounders within the context, which complicates the isolation of causality between bias states and hallucinations. To address this, we utilize the Structural Causal Model (SCM) to establish and validate the causality and design bias interventions to control confounders. In addition, we develop the Bias Intervention Dataset (BID), which includes various social biases, enabling precise measurement of causal effects. Experiments on mainstream LLMs reveal that biases are significant causes of faithfulness hallucinations, and the effect of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Mental Health via Writing · Adversarial Robustness in Machine Learning
