Attention is a smoothed cubic spline
Zehua Lai, Lek-Heng Lim, and Yucong Liu

TL;DR
This paper reveals that the attention mechanism in transformers can be viewed as a smoothed cubic spline, connecting deep learning components to classical approximation theory and providing new mathematical insights into their structure.
Contribution
It demonstrates that transformer components are cubic splines and establishes a mathematical link between transformers and spline theory, offering a new perspective on their operation.
Findings
Attention modules are cubic splines with ReLU activation.
Transformer components are compositions of cubic and higher-order splines.
Replacing ReLU with smooth activations like SoftMax yields a smoothed, infinitely differentiable version.
Abstract
We highlight a perhaps important but hitherto unobserved insight: The attention module in a transformer is a smoothed cubic spline. Viewed in this manner, this mysterious but critical component of a transformer becomes a natural development of an old notion deeply entrenched in classical approximation theory. More precisely, we show that with ReLU-activation, attention, masked attention, encoder-decoder attention are all cubic splines. As every component in a transformer is constructed out of compositions of various attention modules (= cubic splines) and feed forward neural networks (= linear splines), all its components -- encoder, decoder, and encoder-decoder blocks; multilayered encoders and decoders; the transformer itself -- are cubic or higher-order splines. If we assume the Pierce-Birkhoff conjecture, then the converse also holds, i.e., every spline is a ReLU-activated encoder.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Numerical Analysis Techniques
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Softmax · Attention Is All You Need
