A mathematical perspective on Transformers
Borjan Geshkovski, Cyril Letrouit, Yury Polyanskiy, Philippe Rigollet

TL;DR
This paper introduces a mathematical framework viewing Transformers as interacting particle systems, revealing cluster formation over time, and offers new insights for both mathematicians and computer scientists.
Contribution
It presents a novel mathematical perspective on Transformers, modeling them as particle systems to analyze their long-term behavior and cluster emergence.
Findings
Clusters emerge in Transformers over long time scales.
Provides a new theoretical framework for understanding Transformers.
Bridges mathematical theory with practical insights for AI models.
Abstract
Transformers play a central role in the inner workings of large language models. We develop a mathematical framework for analyzing Transformers based on their interpretation as interacting particle systems, which reveals that clusters emerge in long time. Our study explores the underlying theory and offers new perspectives for mathematicians as well as computer scientists.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
