Inexact calculus of variations on the hyperspherical tangent bundle and its connections to the attention mechanism
Andrew Gracyk

TL;DR
This paper develops a mathematical framework for understanding the attention mechanism in Transformers as a calculus of variations problem on the hyperspherical tangent bundle, providing new theoretical insights into neural flow dynamics.
Contribution
It introduces a novel variational calculus approach to analyze Transformer attention as a flow on the hyperspherical tangent bundle, with foundational proofs and broader mathematical implications.
Findings
Attention as a variational problem on the hypersphere
Derivation of a new Euler-Lagrange equation for neural flows
Theoretical justification for Transformer token space dynamics
Abstract
We offer a theoretical mathematical background through Lagrangian optimization on the unit hyperspherical manifold and its tangential collection with application to the Transformer and its token space. Our methods are catered to the attention mechanism in a theoretical setting, but largely appeal to a broader mathematical lens as well. The Transformer, as a flow map, exists in the tangent fiber for each token along the high-dimensional unit sphere. The circumstance of the hypersphere across the latent data is reasonable due to the trained diagonal matrix equal to the identity, which has various empirical justifications. Thus, under the continuum limit of the dynamics, the latent vectors flow among the tangent bundle. Using these facts, we devise a mathematical framework focusing on the attention mechanism through calculus of variations. We develop a functional and show that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural dynamics and brain function · Functional Brain Connectivity Studies
