The Curved Spacetime of Transformer Architectures
Riccardo Di Sipio, Jairo Diaz-Rodriguez, Luis Serrano

TL;DR
This paper introduces a geometric framework for Transformer models, modeling their operations as a curved spacetime where attention induces curvature, affecting token trajectories and providing new insights into their internal mechanics.
Contribution
It proposes a novel analogy between Transformer architectures and General Relativity, introducing a curvature-based perspective to analyze and visualize token interactions and model behavior.
Findings
Visualization of curvature landscape across tokens and layers
Demonstration of non-linear token trajectories due to curvature
Evidence of attention-induced curvature affecting embedding paths
Abstract
We present a geometric framework for understanding Transformer-based language models, drawing an explicit analogy to General Relativity. Queries and keys induce an effective metric on representation space, and attention acts as a discrete connection that implements parallel transport of value vectors across tokens. Stacked layers provide discrete time-slices through which token representations evolve on this curved manifold, while backpropagation plays the role of a least-action principle that shapes loss-minimizing trajectories in parameter space. If this analogy is correct, token embeddings should not traverse straight paths in feature space; instead, their layer-wise steps should bend and reorient as interactions mediated by embedding space curvature. To test this prediction, we design experiments that expose both the presence and the consequences of curvature: (i) we visualize a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmbodied and Extended Cognition · Data Visualization and Analytics · Face Recognition and Perception
