Large Language Models: A Mathematical Formulation
Ricardo Baptista, Andrew Stuart, Son Tran

TL;DR
This paper introduces a comprehensive mathematical framework for large language models, detailing their encoding, architecture, learning process, and deployment, which aids in understanding and improving their performance.
Contribution
It provides a clear, accessible mathematical formulation of LLMs, connecting information theory, probability, and optimization to their design and application.
Findings
Framework clarifies how LLMs encode and predict text sequences.
Demonstrates the empirical success of the mathematical structure.
Suggests new directions for LLM development and analysis.
Abstract
Large language models (LLMs) process and predict sequences containing text to answer questions, and address tasks including document summarization, providing recommendations, writing software and solving quantitative problems. We provide a mathematical framework for LLMs by describing the encoding of text sequences into sequences of tokens, defining the architecture for next-token prediction models, explaining how these models are learned from data, and demonstrating how they are deployed to address a variety of tasks. The mathematical sophistication required to understand this material is not high, and relies on straightforward ideas from information theory, probability and optimization. Nonetheless, the combination of ideas resting on these different components from the mathematical sciences yields a complex algorithmic structure; and this algorithmic structure has demonstrated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Big Data and Digital Economy
