Continuum Attention for Neural Operators
Edoardo Calvello, Nikola B. Kovachki, Matthew E. Levine, Andrew M. Stuart

TL;DR
This paper introduces a novel function space formulation of attention mechanisms, enabling the design of transformer neural operators with universal approximation capabilities for mappings between function spaces.
Contribution
It formulates attention as an operator in infinite-dimensional function spaces and proves a universal approximation theorem for transformer neural operators.
Findings
First universal approximation theorem for transformer neural operators.
Efficient attention-based architectures for multi-dimensional domains.
Numerical results demonstrating effectiveness on operator learning problems.
Abstract
Transformers, and the attention mechanism in particular, have become ubiquitous in machine learning. Their success in modeling nonlocal, long-range correlations has led to their widespread adoption in natural language processing, computer vision, and time series problems. Neural operators, which map spaces of functions into spaces of functions, are necessarily both nonlinear and nonlocal if they are universal; it is thus natural to ask whether the attention mechanism can be used in the design of neural operators. Motivated by this, we study transformers in the function space setting. We formulate attention as a map between infinite dimensional function spaces and prove that the attention mechanism as implemented in practice is a Monte Carlo or finite difference approximation of this operator. The function space formulation allows for the design of transformer neural operators, a class…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsActivation Patching
