TL;DR
This paper introduces MANO, a multipole attention mechanism inspired by n-body simulations, enabling linear complexity transformers that perform well on vision and physics tasks while significantly reducing computational costs.
Contribution
The paper presents MANO, a novel multipole attention method that achieves linear time and memory complexity, improving efficiency over traditional quadratic transformers.
Findings
MANO rivals state-of-the-art models like ViT and Swin Transformer.
MANO reduces runtime and peak memory usage by orders of magnitude.
Empirical results on image classification and Darcy flows validate effectiveness.
Abstract
Transformers have become the de facto standard for a wide range of tasks, from image classification to physics simulations. Despite their impressive performance, the quadratic complexity of standard Transformers in both memory and time with respect to the input length makes them impractical for processing high-resolution inputs. Therefore, several variants have been proposed, the most successful relying on patchification, downsampling, or coarsening techniques, often at the cost of losing the finest-scale details. In this work, we take a different approach. Inspired by state-of-the-art techniques in -body numerical simulations, we cast attention as an interaction problem between grid points. We introduce the Multipole Attention Neural Operator (MANO), which computes attention in a distance-based multiscale fashion. MANO maintains, in each attention head, a global receptive field and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
