Large Language Models as Computable Approximations to Solomonoff Induction
Jun Wan, Lingrui Mei

TL;DR
This paper establishes a theoretical framework connecting large language models to Solomonoff induction via Algorithmic Information Theory, explaining their success and guiding improved few-shot learning strategies.
Contribution
It provides the first formal link between LLMs and Solomonoff induction, unifying explanations for emergent phenomena and proposing a new example selection method.
Findings
The training process approximates Solomonoff prior through loss minimization.
Next-token prediction implements approximate Solomonoff induction.
The proposed example selection improves performance, especially for smaller models.
Abstract
The rapid advancement of large language models (LLMs) calls for a rigorous theoretical framework to explain their empirical success. While significant progress has been made in understanding LLM behaviors, existing theoretical frameworks remain fragmented in explaining emergent phenomena through a unified mathematical lens. We establish the first formal connection between LLM architectures and Algorithmic Information Theory (AIT) by proving two fundamental results: (1) the training process computationally approximates Solomonoff prior through loss minimization interpreted as program length optimization, and (2) next-token prediction implements approximate Solomonoff induction. We leverage AIT to provide a unified theoretical explanation for in-context learning, few-shot learning, and scaling laws. Furthermore, our theoretical insights lead to a principled method for few-shot example…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputability, Logic, AI Algorithms
