Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs

Itay Itzhak; Yonatan Belinkov; Gabriel Stanovsky

arXiv:2507.07186·cs.CL·July 15, 2025

Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs

Itay Itzhak, Yonatan Belinkov, Gabriel Stanovsky

PDF

Open Access 10 Models 1 Datasets

TL;DR

This study investigates whether cognitive biases in large language models originate mainly from pretraining or finetuning, revealing that biases are primarily shaped during pretraining, with implications for bias mitigation strategies.

Contribution

The paper introduces a causal experimental approach to disentangle bias sources in LLMs, demonstrating that pretraining largely determines bias patterns over finetuning.

Findings

01

Bias variability is influenced by training randomness.

02

Pretraining has a stronger impact on biases than finetuning.

03

Bias patterns are more similar among models with the same pretraining backbone.

Abstract

Large language models (LLMs) exhibit cognitive biases -- systematic tendencies of irrational decision-making, similar to those seen in humans. Prior work has found that these biases vary across models and can be amplified by instruction tuning. However, it remains unclear if these differences in biases stem from pretraining, finetuning, or even random noise due to training stochasticity. We propose a two-step causal experimental approach to disentangle these factors. First, we finetune models multiple times using different random seeds to study how training randomness affects over $30$ cognitive biases. Second, we introduce \emph{cross-tuning} -- swapping instruction datasets between models to isolate bias sources. This swap uses datasets that led to different bias patterns, directly testing whether biases are dataset-dependent. Our findings reveal that while training randomness…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

itay1itzhak/flan_2022_350k
dataset· 39 dl
39 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning