Even Sparser Graph Transformers

Hamed Shirzad; Honghao Lin; Balaji Venkatachalam; Ameya Velingker,; David Woodruff; Danica Sutherland

arXiv:2411.16278·cs.LG·November 26, 2024

Even Sparser Graph Transformers

Hamed Shirzad, Honghao Lin, Balaji Venkatachalam, Ameya Velingker,, David Woodruff, Danica Sutherland

PDF

Open Access 1 Repo

TL;DR

This paper introduces Spexphormer, a two-stage graph transformer training method that reduces memory usage by sparsifying the graph after initial training, supported by theoretical analysis and empirical results.

Contribution

It proposes a novel two-stage training approach for graph transformers that maintains performance while significantly reducing memory requirements.

Findings

01

Spexphormer achieves comparable accuracy with less memory.

02

Attention scores are consistent across network widths.

03

Theoretical conditions support the method's effectiveness.

Abstract

Graph Transformers excel in long-range dependency modeling, but generally require quadratic memory complexity in the number of nodes in an input graph, and hence have trouble scaling to large graphs. Sparse attention variants such as Exphormer can help, but may require high-degree augmentations to the input graph for good performance, and do not attempt to sparsify an already-dense input graph. As the learned attention mechanisms tend to use few of these edges, such high-degree connections may be unnecessary. We show (empirically and with theoretical backing) that attention scores on graphs are usually quite consistent across network widths, and use this observation to propose a two-stage procedure, which we call Spexphormer: first, train a narrow network on the full augmented graph. Next, use only the active connections to train a wider network on a much sparser graph. We establish…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hamed1375/Sp_Exphormer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Advanced Memory and Neural Computing · Graph Theory and Algorithms

MethodsSoftmax · Attention Is All You Need