Enhancing Graph Transformers with Hierarchical Distance Structural Encoding
Yuankai Luo, Hongkang Li, Lei Shi, Xiao-Ming Wu

TL;DR
This paper introduces Hierarchical Distance Structural Encoding (HDSE) to improve graph transformers by capturing hierarchical and long-range structures, demonstrating superior performance on various large-scale graph tasks.
Contribution
The paper proposes a novel HDSE method for graph transformers, integrating hierarchical distance encoding to enhance expressivity and scalability, with theoretical and empirical validation.
Findings
HDSE outperforms shortest path distances in expressivity.
Graph transformers with HDSE excel in classification and regression tasks.
Effective on large-scale graphs with up to a billion nodes.
Abstract
Graph transformers need strong inductive biases to derive meaningful attention scores. Yet, current methods often fall short in capturing longer ranges, hierarchical structures, or community structures, which are common in various graphs such as molecules, social networks, and citation networks. This paper presents a Hierarchical Distance Structural Encoding (HDSE) method to model node distances in a graph, focusing on its multi-level, hierarchical nature. We introduce a novel framework to seamlessly integrate HDSE into the attention mechanism of existing graph transformers, allowing for simultaneous application with other positional encodings. To apply graph transformers with HDSE to large-scale graphs, we further propose a high-level HDSE that effectively biases the linear transformers towards graph hierarchies. We theoretically prove the superiority of HDSE over shortest path…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGraph Theory and Algorithms · Data Mining Algorithms and Applications · Advanced Graph Neural Networks
MethodsAttention Is All You Need · Absolute Position Encodings · Linear Layer · Byte Pair Encoding · Multi-Head Attention · Laplacian EigenMap · Adam · Residual Connection · Layer Normalization · Dense Connections
