Formal Algorithms for Transformers

Mary Phuong; Marcus Hutter

arXiv:2207.09238·cs.LG·July 26, 2022·52 cites

Formal Algorithms for Transformers

Mary Phuong, Marcus Hutter

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper provides a detailed, mathematically rigorous overview of transformer architectures, focusing on their components, training methods, and applications, serving as a comprehensive reference for understanding transformer models.

Contribution

It offers a self-contained, precise mathematical overview of transformers, including architecture, training, and key models, filling a gap in detailed theoretical understanding.

Findings

01

Clarifies the architectural components of transformers

02

Details training procedures and applications

03

Previews prominent transformer models

Abstract

This document aims to be a self-contained, mathematically precise overview of transformer architectures and algorithms (*not* results). It covers what transformers are, how they are trained, what they are used for, their key architectural components, and a preview of the most prominent models. The reader is assumed to be familiar with basic ML terminology and simpler neural network architectures such as MLPs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

myazdani/formal-algorithms-for-transformers
pytorch

Models

🤗
wordgrammer/Plato_v1
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEnergy Load and Power Forecasting · Power Transformer Diagnostics and Insulation