A Generalization of Transformer Networks to Graphs

Vijay Prakash Dwivedi; Xavier Bresson

arXiv:2012.09699·cs.LG·January 26, 2021·322 cites

A Generalization of Transformer Networks to Graphs

Vijay Prakash Dwivedi, Xavier Bresson

PDF

Open Access 3 Repos

TL;DR

This paper introduces a generalized transformer architecture tailored for arbitrary graphs, incorporating neighborhood-aware attention, Laplacian eigenvector positional encoding, and edge features, improving performance on graph-based tasks.

Contribution

It presents a novel graph transformer with four key modifications, bridging the gap between traditional transformers and graph neural networks for arbitrary graph structures.

Findings

01

Demonstrates improved performance on graph benchmark tasks.

02

Shows faster training and better generalization with batch normalization.

03

Extends transformer capabilities to include edge feature representations.

Abstract

We propose a generalization of transformer neural network architecture for arbitrary graphs. The original transformer was designed for Natural Language Processing (NLP), which operates on fully connected graphs representing all connections between the words in a sequence. Such architecture does not leverage the graph connectivity inductive bias, and can perform poorly when the graph topology is important and has not been encoded into the node features. We introduce a graph transformer with four new properties compared to the standard model. First, the attention mechanism is a function of the neighborhood connectivity for each node in the graph. Second, the positional encoding is represented by the Laplacian eigenvectors, which naturally generalize the sinusoidal positional encodings often used in NLP. Third, the layer normalization is replaced by a batch normalization layer, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Topic Modeling · Machine Learning in Materials Science

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Adam · Byte Pair Encoding · Dropout · Label Smoothing · Dense Connections · Transformer