Anchors in the Machine: Behavioral and Attributional Evidence of Anchoring Bias in LLMs
Felipe Valencia-Clavijo

TL;DR
This paper investigates the presence and mechanisms of anchoring bias in large language models, combining behavioral analysis and attributional methods to reveal how anchors influence model outputs and internal reasoning.
Contribution
It introduces a comprehensive framework using log-probability analysis and Shapley-value attribution to quantify and interpret anchoring bias in LLMs, advancing understanding of their cognitive-like behaviors.
Findings
Robust anchoring effects observed in several open-source LLMs.
Anchors influence entire output distributions and internal reweighting.
Model sensitivity to anchors varies with size and prompt design.
Abstract
Large language models (LLMs) are increasingly examined as both behavioral subjects and decision systems, yet it remains unclear whether observed cognitive biases reflect surface imitation or deeper probability shifts. Anchoring bias, a classic human judgment bias, offers a critical test case. While prior work shows LLMs exhibit anchoring, most evidence relies on surface-level outputs, leaving internal mechanisms and attributional contributions unexplored. This paper advances the study of anchoring in LLMs through three contributions: (1) a log-probability-based behavioral analysis showing that anchors shift entire output distributions, with controls for training-data contamination; (2) exact Shapley-value attribution over structured prompt fields to quantify anchor influence on model log-probabilities; and (3) a unified Anchoring Bias Sensitivity Score integrating behavioral and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Explainable Artificial Intelligence (XAI)
