Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs
Itay Itzhak, Yonatan Belinkov, Gabriel Stanovsky

TL;DR
This study investigates whether cognitive biases in large language models originate mainly from pretraining or finetuning, revealing that biases are primarily shaped during pretraining, with implications for bias mitigation strategies.
Contribution
The paper introduces a causal experimental approach to disentangle bias sources in LLMs, demonstrating that pretraining largely determines bias patterns over finetuning.
Findings
Bias variability is influenced by training randomness.
Pretraining has a stronger impact on biases than finetuning.
Bias patterns are more similar among models with the same pretraining backbone.
Abstract
Large language models (LLMs) exhibit cognitive biases -- systematic tendencies of irrational decision-making, similar to those seen in humans. Prior work has found that these biases vary across models and can be amplified by instruction tuning. However, it remains unclear if these differences in biases stem from pretraining, finetuning, or even random noise due to training stochasticity. We propose a two-step causal experimental approach to disentangle these factors. First, we finetune models multiple times using different random seeds to study how training randomness affects over cognitive biases. Second, we introduce \emph{cross-tuning} -- swapping instruction datasets between models to isolate bias sources. This swap uses datasets that led to different bias patterns, directly testing whether biases are dataset-dependent. Our findings reveal that while training randomness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗itay1itzhak/OLMo-Tulu-Seed-1model· 3 dl3 dl
- 🤗itay1itzhak/OLMo-Tulu-Seed-2model· 1 dl1 dl
- 🤗itay1itzhak/OLMo-Tulu-Seed-0model· 3 dl3 dl
- 🤗itay1itzhak/OLMo-Flan-Seed-0model· 4 dl4 dl
- 🤗itay1itzhak/OLMo-Flan-Seed-2model· 5 dl5 dl
- 🤗itay1itzhak/OLMo-Flan-Seed-1model· 1 dl1 dl
- 🤗itay1itzhak/T5-Flan-Seed-2model
- 🤗itay1itzhak/T5-Flan-Seed-1model· 2 dl2 dl
- 🤗itay1itzhak/T5-Flan-Seed-0model· 2 dl2 dl
- 🤗itay1itzhak/T5-Tulu-Seed-0model
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning
