Are More Layers Beneficial to Graph Transformers?
Haiteng Zhao, Shuming Ma, Dongdong Zhang, Zhi-Hong Deng, Furu Wei

TL;DR
This paper investigates the depth limitations of graph transformers, identifies the bottleneck caused by global attention, and proposes DeepGraph, a model that incorporates substructure tokens and local attention to enable deeper, more expressive graph transformers with state-of-the-art results.
Contribution
The paper introduces DeepGraph, a novel graph transformer architecture that uses substructure tokens and local attention to overcome depth limitations and improve expressiveness.
Findings
DeepGraph enables deeper graph transformer models.
The proposed method achieves state-of-the-art performance on various benchmarks.
DeepGraph effectively focuses on critical substructures within graphs.
Abstract
Despite that going deep has proven successful in many neural architectures, the existing graph transformers are relatively shallow. In this work, we explore whether more layers are beneficial to graph transformers, and find that current graph transformers suffer from the bottleneck of improving performance by increasing depth. Our further analysis reveals the reason is that deep graph transformers are limited by the vanishing capacity of global attention, restricting the graph transformer from focusing on the critical substructure and obtaining expressive features. To this end, we propose a novel graph transformer model named DeepGraph that explicitly employs substructure tokens in the encoded representation, and applies local attention on related nodes to obtain substructure based attention encoding. Our model enhances the ability of the global attention to focus on substructures and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Graph Neural Networks · Online Learning and Analytics · Graph Theory and Algorithms
MethodsAttention Is All You Need · Laplacian EigenMap · Linear Layer · Absolute Position Encodings · Label Smoothing · Softmax · Adam · Layer Normalization · Residual Connection · Laplacian Positional Encodings
