Transformers Are Born Biased: Structural Inductive Biases at Random Initialization and Their Practical Consequences

Siquan Li; Yao Tong; Haonan Wang; Tianyang Hu

arXiv:2602.05927·stat.ML·February 6, 2026

Transformers Are Born Biased: Structural Inductive Biases at Random Initialization and Their Practical Consequences

Siquan Li, Yao Tong, Haonan Wang, Tianyang Hu

PDF

Open Access

TL;DR

This paper reveals that transformers have strong, systematic biases at initialization due to architecture-induced contraction of token representations, which persist after training and influence model behavior and stability.

Contribution

It uncovers the intrinsic biases in randomly initialized transformers, explains their mechanistic origin, and introduces SeedPrint for model fingerprinting and bias analysis.

Findings

01

Untrained transformers show extreme token preferences.

02

Initialization biases persist through training.

03

Identifies a positional discrepancy causing attention sinks.

Abstract

Transformers underpin modern large language models (LLMs) and are commonly assumed to be behaviorally unstructured at random initialization, with all meaningful preferences emerging only through large-scale training. We challenge this assumption by showing that randomly initialized transformers already exhibit strong and systematic structural biases. In particular, untrained models display extreme token preferences: across random input sequences, certain tokens are predicted with probabilities orders of magnitude larger. We provide a mechanistic explanation for this phenomenon by dissecting the transformer architecture at initialization. We show that extreme token preference arises from a contraction of token representations along a random seed-dependent direction. This contraction is driven by two interacting forces: (i) asymmetric nonlinear activations in MLP sublayers induce global…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage and cultural evolution · Language Development and Disorders · Neurobiology of Language and Bilingualism