Hierarchical Graph Transformer with Adaptive Node Sampling
Zaixi Zhang, Qi Liu, Qingyong Hu, Chee-Kong Lee

TL;DR
This paper introduces a hierarchical graph transformer with adaptive node sampling that improves performance on large graphs by capturing long-range dependencies and optimizing sampling strategies through an adversary bandit formulation.
Contribution
It proposes a novel hierarchical attention scheme with graph coarsening and adaptive node sampling formulated as an adversary bandit problem, enhancing graph transformer performance.
Findings
Outperforms existing graph transformers on real-world datasets
Effectively captures long-range dependencies in large graphs
Reduces computational complexity with hierarchical attention
Abstract
The Transformer architecture has achieved remarkable success in a number of domains including natural language processing and computer vision. However, when it comes to graph-structured data, transformers have not achieved competitive performance, especially on large graphs. In this paper, we identify the main deficiencies of current graph transformers:(1) Existing node sampling strategies in Graph Transformers are agnostic to the graph characteristics and the training process. (2) Most sampling strategies only focus on local neighbors and neglect the long-range dependencies in the graph. We conduct experimental investigations on synthetic datasets to show that existing sampling strategies are sub-optimal. To tackle the aforementioned problems, we formulate the optimization strategies of node sampling in Graph Transformer as an adversary bandit problem, where the rewards are related to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Graph Neural Networks · Online Learning and Analytics
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Laplacian EigenMap · Position-Wise Feed-Forward Layer · Residual Connection · Dropout · Laplacian Positional Encodings · Softmax · Label Smoothing
