Constant Memory Attention Block
Leo Feng, Frederick Tung, Hossein Hajimirsadeghi, Yoshua Bengio,, Mohamed Osama Ahmed

TL;DR
This paper introduces the Constant Memory Attention Block (CMAB), a novel attention mechanism that operates with fixed memory and computation, enabling efficient modeling in low-resource environments while maintaining competitive performance.
Contribution
The paper presents CMAB, a new attention block that uses constant memory and computation, applicable to Neural Processes and Temporal Point Processes, with competitive results.
Findings
Achieves state-of-the-art performance with reduced memory usage
Demonstrates effectiveness in Neural Processes and Temporal Point Processes
Maintains competitive accuracy while being more memory-efficient
Abstract
Modern foundation model architectures rely on attention mechanisms to effectively capture context. However, these methods require linear or quadratic memory in terms of the number of inputs/datapoints, limiting their applicability in low-compute domains. In this work, we propose Constant Memory Attention Block (CMAB), a novel general-purpose attention block that computes its output in constant memory and performs updates in constant computation. Highlighting CMABs efficacy, we introduce methods for Neural Processes and Temporal Point Processes. Empirically, we show our proposed methods achieve results competitive with state-of-the-art while being significantly more memory efficient.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Advanced Graph Neural Networks
