Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent   Cognitive Bias

Itay Itzhak; Gabriel Stanovsky; Nir Rosenfeld; Yonatan Belinkov

arXiv:2308.00225·cs.AI·April 2, 2024·6 cites

Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias

Itay Itzhak, Gabriel Stanovsky, Nir Rosenfeld, Yonatan Belinkov

PDF

Open Access 1 Repo

TL;DR

This paper investigates how instruction tuning and reinforcement learning from human feedback influence cognitive biases in large language models, revealing that such tuning can increase biases like the decoy, certainty, and belief biases.

Contribution

It demonstrates that instruction tuning amplifies cognitive biases in various large language models, highlighting a potential adverse effect of current fine-tuning methods.

Findings

01

Biases are present in GPT-3, Mistral, and T5 models.

02

Instruction tuning increases the strength of cognitive biases.

03

Understanding biases is crucial for developing reliable language models.

Abstract

Recent studies show that instruction tuning (IT) and reinforcement learning from human feedback (RLHF) improve the abilities of large language models (LMs) dramatically. While these tuning methods can help align models with human objectives and generate high-quality text, not much is known about their potential adverse effects. In this work, we investigate the effect of IT and RLHF on decision making and reasoning in LMs, focusing on three cognitive biases - the decoy effect, the certainty effect, and the belief bias - all of which are known to influence human decision-making and reasoning. Our findings highlight the presence of these biases in various models from the GPT-3, Mistral, and T5 families. Notably, we find a stronger presence of biases in models that have undergone instruction tuning, such as Flan-T5, Mistral-Instruct, GPT3.5, and GPT4. Our work constitutes a step toward…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

itay1itzhak/instructedtobias
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Intelligent Tutoring Systems and Adaptive Learning

MethodsAttention Is All You Need · Weight Decay · 15 Ways to Contact How can i speak to someone at Delta Airlines · Linear Layer · Dense Connections · Adam · Adafactor · Gated Linear Unit · Attention Dropout · Inverse Square Root Schedule