Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias
Itay Itzhak, Gabriel Stanovsky, Nir Rosenfeld, Yonatan Belinkov

TL;DR
This paper investigates how instruction tuning and reinforcement learning from human feedback influence cognitive biases in large language models, revealing that such tuning can increase biases like the decoy, certainty, and belief biases.
Contribution
It demonstrates that instruction tuning amplifies cognitive biases in various large language models, highlighting a potential adverse effect of current fine-tuning methods.
Findings
Biases are present in GPT-3, Mistral, and T5 models.
Instruction tuning increases the strength of cognitive biases.
Understanding biases is crucial for developing reliable language models.
Abstract
Recent studies show that instruction tuning (IT) and reinforcement learning from human feedback (RLHF) improve the abilities of large language models (LMs) dramatically. While these tuning methods can help align models with human objectives and generate high-quality text, not much is known about their potential adverse effects. In this work, we investigate the effect of IT and RLHF on decision making and reasoning in LMs, focusing on three cognitive biases - the decoy effect, the certainty effect, and the belief bias - all of which are known to influence human decision-making and reasoning. Our findings highlight the presence of these biases in various models from the GPT-3, Mistral, and T5 families. Notably, we find a stronger presence of biases in models that have undergone instruction tuning, such as Flan-T5, Mistral-Instruct, GPT3.5, and GPT4. Our work constitutes a step toward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Intelligent Tutoring Systems and Adaptive Learning
MethodsAttention Is All You Need · Weight Decay · 15 Ways to Contact How can i speak to someone at Delta Airlines · Linear Layer · Dense Connections · Adam · Adafactor · Gated Linear Unit · Attention Dropout · Inverse Square Root Schedule
