How Particle-System Random Batch Methods Enhance Graph Transformer: Memory Efficiency and Parallel Computing Strategy
Hanwen Liu, Yixuan Ma, Shi Jin, Yuguang Wang

TL;DR
This paper introduces Random Batch Attention, a linear self-attention mechanism for Transformers that reduces complexity, enhances memory efficiency, and maintains expressivity, supported by theoretical analysis and validated through experiments on large graphs.
Contribution
It proposes a novel linear self-attention mechanism with theoretical support, improving efficiency and scalability of Transformer models.
Findings
Linear time complexity of RBA.
Memory savings through parallel implementation.
Effective on large graph datasets.
Abstract
Attention mechanism is a significant part of Transformer models. It helps extract features from embedded vectors by adding global information and its expressivity has been proved to be powerful. Nevertheless, the quadratic complexity restricts its practicability. Although several researches have provided attention mechanism in sparse form, they are lack of theoretical analysis about the expressivity of their mechanism while reducing complexity. In this paper, we put forward Random Batch Attention (RBA), a linear self-attention mechanism, which has theoretical support of the ability to maintain its expressivity. Random Batch Attention has several significant strengths as follows: (1) Random Batch Attention has linear time complexity. Other than this, it can be implemented in parallel on a new dimension, which contributes to much memory saving. (2) Random Batch Attention mechanism can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Big Data and Digital Economy
