Truth as a Compression Artifact in Language Model Training

Konstantin Krestnikov

arXiv:2603.11749·cs.CL·April 7, 2026

Truth as a Compression Artifact in Language Model Training

Konstantin Krestnikov

PDF

1 Repo 1 Datasets

TL;DR

This paper investigates why language models often prefer seemingly correct answers by analyzing how compressibility of errors influences model predictions, revealing that models favor the most compressible answer rather than the truth.

Contribution

It introduces the Compression--Consistency Principle, showing that models' bias towards certain answers depends on the structural coherence of false information, not truth itself.

Findings

01

Models extract correct signals when errors are random, with accuracy increasing with size.

02

Coherent false rules can significantly reduce the model's ability to distinguish truth from falsehood.

03

Adding multiple conflicting rules restores the model's bias towards false answers, demonstrating the impact of structural coherence.

Abstract

Why do language models trained on contradictory data prefer correct answers? In controlled experiments with small transformers (3.5M--86M parameters), we show that this preference tracks the compressibility structure of errors rather than truth per se. We train GPT-2 style models on corpora where each mathematical problem appears with both correct and incorrect solutions -- a denoising design that directly models conflicting information about the same fact. When errors are random, models extract the correct signal with accuracy scaling from 65% to 85% with model size. When errors follow a coherent alternative rule system, accuracy drops to chance (~45--51%): the model cannot distinguish the false system from truth. A multi-rule experiment reveals a sharp crossover: a single coherent alternative rule eliminates truth bias entirely, but adding a second competing rule restores most of it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rai220/compression-drives-truth
github

Datasets

krestnikov/compression-drives-truth
dataset· 102 dl
102 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.