DHIL-GT: Scalable Graph Transformer with Decoupled Hierarchy Labeling
Ningyi Liao, Zihao Yu, Siqiang Luo

TL;DR
DHIL-GT introduces a scalable graph transformer that precomputes hierarchical graph labels to significantly reduce computational complexity, enabling efficient processing of large graphs while maintaining high accuracy.
Contribution
The paper proposes DHIL-GT, a novel scalable Graph Transformer that decouples graph computation through hierarchical labeling and precomputation, improving efficiency and scalability.
Findings
Achieves linear complexity in graph edges and nodes.
Outperforms existing scalable GT models on large benchmarks.
Maintains top-tier accuracy on diverse graph types.
Abstract
Graph Transformer (GT) has recently emerged as a promising neural network architecture for learning graph-structured data. However, its global attention mechanism with quadratic complexity concerning the graph scale prevents wider application to large graphs. While current methods attempt to enhance GT scalability by altering model architecture or encoding hierarchical graph data, our analysis reveals that these models still suffer from the computational bottleneck related to graph-scale operations. In this work, we target the GT scalability issue and propose DHIL-GT, a scalable Graph Transformer that simplifies network learning by fully decoupling the graph computation to a separate stage in advance. DHIL-GT effectively retrieves hierarchical information by exploiting the graph labeling technique, as we show that the graph label hierarchy is more informative than plain adjacency by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph Theory and Algorithms · Graph Labeling and Dimension Problems · Advanced Graph Theory Research
MethodsAttention Is All You Need · Adam · Position-Wise Feed-Forward Layer · Linear Layer · Softmax · Multi-Head Attention · Byte Pair Encoding · Label Smoothing · Dropout · Laplacian EigenMap
