CogBias: Measuring and Mitigating Cognitive Bias in Large Language Models
Fan Huang, Songheng Zhang, Haewoon Kwak, Jisun An

TL;DR
This paper introduces CogBias, a benchmark for measuring cognitive biases in large language models, and explores methods to understand and mitigate these biases across different models and bias types.
Contribution
The paper defines LLM cognitive bias, creates a benchmark for systematic bias evaluation, and demonstrates activation steering to reduce biases with minimal impact on performance.
Findings
Cognitive biases are systematically present across all four bias families.
Prompt-level debiasing reduces Response biases but can worsen Judgment biases.
Activation steering achieves 26-32% bias reduction with minimal performance loss.
Abstract
Large Language Models (LLMs) are increasingly deployed in high-stakes decision-making contexts. While prior work has shown that LLMs exhibit cognitive biases behaviorally, whether these biases correspond to identifiable internal representations and can be mitigated through targeted intervention remains an open question. We define LLM cognitive bias as systematic, reproducible deviations from correct answers in tasks with computable ground-truth baselines, and introduce LLM CogBias, a benchmark organized around four families of cognitive biases: Judgment, Information Processing, Social, and Response. We evaluate three LLMs and find that cognitive biases emerge systematically across all four families, with magnitudes and debiasing responses that are strongly family-dependent: prompt-level debiasing substantially reduces Response biases but backfires for Judgment biases. Using linear…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
