Attending to Graph Transformers
Luis M\"uller, Mikhail Galkin, Christopher Morris, Ladislav, Ramp\'a\v{s}ek

TL;DR
This paper provides a comprehensive taxonomy, theoretical analysis, and empirical evaluation of graph transformer architectures, highlighting their capabilities, limitations, and potential for future research in graph machine learning.
Contribution
It introduces a taxonomy of graph transformers, surveys their properties, and empirically evaluates their ability to recover graph features and handle complex graph types.
Findings
Graph transformers can recover various graph properties effectively.
They show potential in handling heterophilic graphs.
They help mitigate over-squashing in graph neural networks.
Abstract
Recently, transformer architectures for graphs emerged as an alternative to established techniques for machine learning with graphs, such as (message-passing) graph neural networks. So far, they have shown promising empirical results, e.g., on molecular prediction datasets, often attributed to their ability to circumvent graph neural networks' shortcomings, such as over-smoothing and over-squashing. Here, we derive a taxonomy of graph transformer architectures, bringing some order to this emerging field. We overview their theoretical properties, survey structural and positional encodings, and discuss extensions for important graph classes, e.g., 3D molecular graphs. Empirically, we probe how well graph transformers can recover various graph properties, how well they can deal with heterophilic graphs, and to what extent they prevent over-squashing. Further, we outline open challenges and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Advanced Graph Neural Networks · Graph Theory and Algorithms
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Layer Normalization · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Label Smoothing · Softmax · Residual Connection
