Understanding the Anchoring Effect of LLM with Synthetic Data: Existence, Mechanism, and Potential Mitigations
Yiming Huang, Biquan Bie, Zuqiu Na, Weilin Ruan, Songxin Lei, Yutao Yue, Xinlei He

TL;DR
This paper investigates the presence and mechanisms of the anchoring bias in Large Language Models, introduces a synthetic dataset for large-scale study, and evaluates mitigation strategies, finding that bias is prevalent and hard to eliminate.
Contribution
It introduces SynAnchors, a new dataset for studying LLM anchoring bias, and benchmarks current models, revealing persistent bias and partial mitigation through reasoning.
Findings
LLMs exhibit anchoring bias with shallow-layer reliance.
Conventional mitigation strategies are ineffective against bias.
Reasoning can partially mitigate the anchoring effect.
Abstract
The rise of Large Language Models (LLMs) like ChatGPT has advanced natural language processing, yet concerns about cognitive biases are growing. In this paper, we investigate the anchoring effect, a cognitive bias where the mind relies heavily on the first information as anchors to make affected judgments. We explore whether LLMs are affected by anchoring, the underlying mechanisms, and potential mitigation strategies. To facilitate studies at scale on the anchoring effect, we introduce a new dataset, SynAnchors (https://huggingface.co/datasets/TimTargaryen/SynAnchors). Combining refined evaluation metrics, we benchmark current widely used LLMs. Our findings show that LLMs' anchoring bias exists commonly with shallow-layer acting and can not be eliminated by conventional strategies, while reasoning can offer some mitigation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
