In-Context Occam's Razor: How Transformers Prefer Simpler Hypotheses on the Fly
Puneesh Deora, Bhavya Vasudeva, Tina Behnia, Christos Thrampoulidis

TL;DR
This paper shows that transformers naturally prefer simpler explanations when faced with hierarchical tasks, effectively implementing an in-context Bayesian Occam's razor, which influences their inductive biases and task inference.
Contribution
It introduces a theoretical and empirical framework demonstrating transformers' preference for simpler hypotheses and validates this bias on pretrained GPT-4 with Boolean tasks.
Findings
Transformers identify the correct complexity level of tasks.
They favor the simplest sufficient explanation among multiple hypotheses.
The behavior aligns with a Bayesian Occam's razor principle.
Abstract
In-context learning (ICL) enables transformers to adapt to new tasks through contextual examples without parameter updates. While existing research has typically studied ICL in fixed-complexity environments, practical language models encounter tasks spanning diverse complexity levels. This paper investigates how transformers navigate hierarchical task structures where higher-complexity categories can perfectly represent any pattern generated by simpler ones. We design well-controlled testbeds based on Markov chains and linear regression that reveal transformers not only identify the appropriate complexity level for each task but also accurately infer the corresponding parameters--even when the in-context examples are compatible with multiple complexity hypotheses. Notably, when presented with data generated by simpler processes, transformers consistently favor the least complex…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
