Transformers As Approximations of Solomonoff Induction

Nathan Young; Michael Witbrock

arXiv:2408.12065·cs.AI·August 23, 2024

Transformers As Approximations of Solomonoff Induction

Nathan Young, Michael Witbrock

PDF

Open Access

TL;DR

This paper investigates whether Transformer models, the foundation of large language models, approximate Solomonoff Induction more closely than other sequence prediction methods, aiming to understand their theoretical optimality.

Contribution

The paper proposes and explores the hypothesis that Transformers approximate Solomonoff Induction better than existing methods, providing a framework for future modeling of AI.

Findings

01

Evidence supporting Transformers as approximations of Solomonoff Induction

02

Alternative hypotheses considering the evidence

03

Outline of future research directions

Abstract

Solomonoff Induction is an optimal-in-the-limit unbounded algorithm for sequence prediction, representing a Bayesian mixture of every computable probability distribution and performing close to optimally in predicting any computable sequence. Being an optimal form of computational sequence prediction, it seems plausible that it may be used as a model against which other methods of sequence prediction might be compared. We put forth and explore the hypothesis that Transformer models - the basis of Large Language Models - approximate Solomonoff Induction better than any other extant sequence prediction method. We explore evidence for and against this hypothesis, give alternate hypotheses that take this evidence into account, and outline next steps for modelling Transformers and other kinds of AI in this way.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputability, Logic, AI Algorithms

MethodsAttention Is All You Need · Linear Layer · Residual Connection · Multi-Head Attention · Adam · Layer Normalization · Position-Wise Feed-Forward Layer · Dense Connections · Byte Pair Encoding · Absolute Position Encodings