NAGphormer: A Tokenized Graph Transformer for Node Classification in Large Graphs
Jinsong Chen, Kaiyuan Gao, Gaichao Li, Kun He

TL;DR
NAGphormer introduces a scalable graph Transformer that leverages neighborhood aggregation through tokenized sequences, enabling effective node classification on large graphs with improved performance over existing methods.
Contribution
The paper proposes NAGphormer, a novel tokenized graph Transformer that uses Hop2Token for neighborhood aggregation, allowing scalable training on large graphs and outperforming existing models.
Findings
NAGphormer outperforms existing graph Transformers on benchmark datasets.
The method scales effectively to large graphs due to mini-batch training.
Mathematical analysis shows it learns more informative node representations.
Abstract
The graph Transformer emerges as a new architecture and has shown superior performance on various graph mining tasks. In this work, we observe that existing graph Transformers treat nodes as independent tokens and construct a single long sequence composed of all node tokens so as to train the Transformer model, causing it hard to scale to large graphs due to the quadratic complexity on the number of nodes for the self-attention computation. To this end, we propose a Neighborhood Aggregation Graph Transformer (NAGphormer) that treats each node as a sequence containing a series of tokens constructed by our proposed Hop2Token module. For each node, Hop2Token aggregates the neighborhood features from different hops into different representations and thereby produces a sequence of token vectors as one input. In this way, NAGphormer could be trained in a mini-batch manner and thus could scale…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Complex Network Analysis Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Laplacian EigenMap · Layer Normalization · Label Smoothing · Softmax · Absolute Position Encodings · Dropout · Adam
