Large Language Models: A Mathematical Formulation

Ricardo Baptista; Andrew Stuart; Son Tran

arXiv:2601.22170·math.NA·February 2, 2026

Large Language Models: A Mathematical Formulation

Ricardo Baptista, Andrew Stuart, Son Tran

PDF

Open Access

TL;DR

This paper introduces a comprehensive mathematical framework for large language models, detailing their encoding, architecture, learning process, and deployment, which aids in understanding and improving their performance.

Contribution

It provides a clear, accessible mathematical formulation of LLMs, connecting information theory, probability, and optimization to their design and application.

Findings

01

Framework clarifies how LLMs encode and predict text sequences.

02

Demonstrates the empirical success of the mathematical structure.

03

Suggests new directions for LLM development and analysis.

Abstract

Large language models (LLMs) process and predict sequences containing text to answer questions, and address tasks including document summarization, providing recommendations, writing software and solving quantitative problems. We provide a mathematical framework for LLMs by describing the encoding of text sequences into sequences of tokens, defining the architecture for next-token prediction models, explaining how these models are learned from data, and demonstrating how they are deployed to address a variety of tasks. The mathematical sophistication required to understand this material is not high, and relies on straightforward ideas from information theory, probability and optimization. Nonetheless, the combination of ideas resting on these different components from the mathematical sciences yields a complex algorithmic structure; and this algorithmic structure has demonstrated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Big Data and Digital Economy