Adaptive Multi-Neighborhood Attention based Transformer for Graph Representation Learning
Gaichao Li, Jinsong Chen, Kun He

TL;DR
This paper introduces MNA-GT, an adaptive graph Transformer that dynamically captures multi-hop neighborhood information for improved graph representation learning, outperforming existing methods across various benchmarks.
Contribution
The paper proposes an adaptive multi-neighborhood attention mechanism within Transformers, enabling flexible and effective graph structural feature extraction for diverse graph topologies.
Findings
MNA-GT outperforms strong baselines on multiple graph benchmarks.
The adaptive attention mechanism effectively captures structural information.
The model demonstrates robustness across different graph types.
Abstract
By incorporating the graph structural information into Transformers, graph Transformers have exhibited promising performance for graph representation learning in recent years. Existing graph Transformers leverage specific strategies, such as Laplacian eigenvectors and shortest paths of the node pairs, to preserve the structural features of nodes and feed them into the vanilla Transformer to learn the representations of nodes. It is hard for such predefined rules to extract informative graph structural features for arbitrary graphs whose topology structure varies greatly, limiting the learning capacity of the models. To this end, we propose an adaptive graph Transformer, termed Multi-Neighborhood Attention based Graph Transformer (MNA-GT), which captures the graph structural information for each node from the multi-neighborhood attention mechanism adaptively. By defining the input to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Dense Connections · Label Smoothing · Layer Normalization · Softmax · Adam · Absolute Position Encodings
