Epistemic Bias Injection: Biasing LLMs via Selective Context Retrieval
Hao Wu, Prateek Saxena

TL;DR
This paper introduces Epistemic Bias Injection, a subtle attack on retrieval-augmented LLMs that manipulates retrieved context with truthful yet biased information, and proposes a metric and defense to mitigate this threat.
Contribution
The paper presents a novel geometric metric for quantifying epistemic bias, demonstrates effective bias injection attacks, and introduces BiasDef, a lightweight defense for RAG-enabled LLMs.
Findings
EBI can significantly shift perspectives in LLM outputs.
Existing defenses are ineffective against EBI attacks.
BiasDef reduces bias and improves retrieval sanitization.
Abstract
When answering user queries, LLMs often retrieve knowledge from external sources stored in retrieval-augmented generation (RAG) databases. These are often populated from unvetted sources, e.g. the open web, and can contain maliciously crafted data. This paper studies attacks that can manipulate the context retrieved by LLMs from such RAG databases. Prior work on such context manipulation primarily injects false or toxic content, which can often be detected by fact-checking or linguistic analysis. We reveal a more subtle threat, Epistemic Bias Injection (EBI), in which adversaries inject factually correct yet epistemically biased passages that systematically emphasize one side of a multi-viewpoint issue. Although linguistically coherent and truthful, such adversarial passages effectively crowd out alternative viewpoints and steer model outputs toward an attacker-chosen stance. As a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Adversarial Robustness in Machine Learning
