Transformers As Approximations of Solomonoff Induction
Nathan Young, Michael Witbrock

TL;DR
This paper investigates whether Transformer models, the foundation of large language models, approximate Solomonoff Induction more closely than other sequence prediction methods, aiming to understand their theoretical optimality.
Contribution
The paper proposes and explores the hypothesis that Transformers approximate Solomonoff Induction better than existing methods, providing a framework for future modeling of AI.
Findings
Evidence supporting Transformers as approximations of Solomonoff Induction
Alternative hypotheses considering the evidence
Outline of future research directions
Abstract
Solomonoff Induction is an optimal-in-the-limit unbounded algorithm for sequence prediction, representing a Bayesian mixture of every computable probability distribution and performing close to optimally in predicting any computable sequence. Being an optimal form of computational sequence prediction, it seems plausible that it may be used as a model against which other methods of sequence prediction might be compared. We put forth and explore the hypothesis that Transformer models - the basis of Large Language Models - approximate Solomonoff Induction better than any other extant sequence prediction method. We explore evidence for and against this hypothesis, give alternate hypotheses that take this evidence into account, and outline next steps for modelling Transformers and other kinds of AI in this way.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputability, Logic, AI Algorithms
MethodsAttention Is All You Need · Linear Layer · Residual Connection · Multi-Head Attention · Adam · Layer Normalization · Position-Wise Feed-Forward Layer · Dense Connections · Byte Pair Encoding · Absolute Position Encodings
