Physical models realizing the transformer architecture of large language models
Zeqian Chen

TL;DR
This paper develops physical models of large language models based on transformer architecture, viewing them as open quantum systems in Fock space, aiming to deepen the theoretical understanding of their physical nature.
Contribution
It introduces a novel physical modeling approach for transformers as open quantum systems, bridging the gap between neural network architecture and quantum physics.
Findings
Transformers can be modeled as open quantum systems in Fock space.
The physical models underpin the transformer architecture for large language models.
Provides a new perspective on the physical realization of neural network models.
Abstract
The introduction of the transformer architecture in 2017 marked the most striking advancement in natural language processing. The transformer is a model architecture relying entirely on an attention mechanism to draw global dependencies between input and output. However, we believe there is a gap in our theoretical understanding of what the transformer is, and how it works physically. From a physical perspective on modern chips, such as those chips under 28nm, modern intelligent machines should be regarded as open quantum systems beyond conventional statistical systems. Thereby, in this paper, we construct physical models realizing large language models based on a transformer architecture as open quantum systems in the Fock space over the Hilbert space of tokens. Our physical models underlie the transformer architecture for large language models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Big Data and Digital Economy · Natural Language Processing Techniques
