A Generalization of Transformer Networks to Graphs
Vijay Prakash Dwivedi, Xavier Bresson

TL;DR
This paper introduces a generalized transformer architecture tailored for arbitrary graphs, incorporating neighborhood-aware attention, Laplacian eigenvector positional encoding, and edge features, improving performance on graph-based tasks.
Contribution
It presents a novel graph transformer with four key modifications, bridging the gap between traditional transformers and graph neural networks for arbitrary graph structures.
Findings
Demonstrates improved performance on graph benchmark tasks.
Shows faster training and better generalization with batch normalization.
Extends transformer capabilities to include edge feature representations.
Abstract
We propose a generalization of transformer neural network architecture for arbitrary graphs. The original transformer was designed for Natural Language Processing (NLP), which operates on fully connected graphs representing all connections between the words in a sequence. Such architecture does not leverage the graph connectivity inductive bias, and can perform poorly when the graph topology is important and has not been encoded into the node features. We introduce a graph transformer with four new properties compared to the standard model. First, the attention mechanism is a function of the neighborhood connectivity for each node in the graph. Second, the positional encoding is represented by the Laplacian eigenvectors, which naturally generalize the sinusoidal positional encodings often used in NLP. Third, the layer normalization is replaced by a batch normalization layer, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Topic Modeling · Machine Learning in Materials Science
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Adam · Byte Pair Encoding · Dropout · Label Smoothing · Dense Connections · Transformer
