k-Maximum Inner Product Attention for Graph Transformers and the Expressive Power of GraphGPS
Jonas De Schouwer, Haitz S\'aez de Oc\'ariz Borde, Xiaowen Dong

TL;DR
This paper introduces k-MIP attention for graph transformers, achieving linear memory complexity and high scalability while maintaining expressive power, enabling processing of large graphs efficiently.
Contribution
The paper proposes k-MIP attention, a sparse, top-k based attention mechanism that balances efficiency and expressive power in large-scale graph transformers.
Findings
k-MIP attention achieves linear memory complexity.
Enables processing graphs with over 500k nodes on a single GPU.
Maintains full expressive power comparable to full-attention transformers.
Abstract
Graph transformers have shown promise in overcoming limitations of traditional graph neural networks, such as oversquashing and difficulties in modeling long-range dependencies. However, their application to large-scale graphs is hindered by the quadratic memory and computational complexity of the all-to-all attention mechanism. Although alternatives such as linearized attention and restricted attention patterns have been proposed, these often degrade performance or limit expressive power. To better balance efficiency and effectiveness, we introduce k-Maximum Inner Product (k-MIP) attention for graph transformers. k-MIP attention selects the most relevant key nodes per query via a top-k operation, yielding a sparse yet flexible attention pattern. Combined with an attention score computation based on symbolic matrices, this results in linear memory complexity and practical speedups of up…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
