Global Self-Attention as a Replacement for Graph Convolution

Md Shamim Hussain; Mohammed J. Zaki; Dharmashankar Subramanian

arXiv:2108.03348·cs.LG·June 6, 2022

Global Self-Attention as a Replacement for Graph Convolution

Md Shamim Hussain, Mohammed J. Zaki, Dharmashankar Subramanian

PDF

3 Repos

TL;DR

This paper introduces Edge-augmented Graph Transformer (EGT), a novel architecture that uses global self-attention with edge channels to effectively learn from graph-structured data, outperforming traditional graph neural networks on benchmarks.

Contribution

The paper presents EGT, a new graph learning model that replaces convolution with global self-attention and incorporates evolving edge channels for enhanced structural learning.

Findings

01

EGT outperforms message-passing GNNs on benchmark datasets.

02

EGT achieves state-of-the-art results on the OGB-LSC PCQM4Mv2 quantum-chemical regression task.

03

Global self-attention can effectively replace local convolutional aggregation in graph learning.

Abstract

We propose an extension to the transformer neural network architecture for general-purpose graph learning by adding a dedicated pathway for pairwise structural information, called edge channels. The resultant framework - which we call Edge-augmented Graph Transformer (EGT) - can directly accept, process and output structural information of arbitrary form, which is important for effective learning on graph-structured data. Our model exclusively uses global self-attention as an aggregation mechanism rather than static localized convolutional aggregation. This allows for unconstrained long-range dynamic interactions between nodes. Moreover, the edge channels allow the structural information to evolve from layer to layer, and prediction tasks on edges/links can be performed directly from the output embeddings of these channels. We verify the performance of EGT in a wide range of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Convolution · Edge-augmented Graph Transformer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Laplacian EigenMap · Adam · Laplacian Positional Encodings