Formal Algorithms for Transformers
Mary Phuong, Marcus Hutter

TL;DR
This paper provides a detailed, mathematically rigorous overview of transformer architectures, focusing on their components, training methods, and applications, serving as a comprehensive reference for understanding transformer models.
Contribution
It offers a self-contained, precise mathematical overview of transformers, including architecture, training, and key models, filling a gap in detailed theoretical understanding.
Findings
Clarifies the architectural components of transformers
Details training procedures and applications
Previews prominent transformer models
Abstract
This document aims to be a self-contained, mathematically precise overview of transformer architectures and algorithms (*not* results). It covers what transformers are, how they are trained, what they are used for, their key architectural components, and a preview of the most prominent models. The reader is assumed to be familiar with basic ML terminology and simpler neural network architectures such as MLPs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEnergy Load and Power Forecasting · Power Transformer Diagnostics and Insulation
