Distinguished In Uniform: Self Attention Vs. Virtual Nodes
Eran Rosenbluth, Jan T\"onshoff, Martin Ritzert, Berke Kisin, Martin, Grohe

TL;DR
This paper compares the expressivity of Graph Transformers with self-attention and Virtual Nodes, showing neither can universally approximate functions across all graph sizes, with experiments on synthetic and real data.
Contribution
It provides a theoretical comparison of the uniform expressivity of GTs and MPGNNs with Virtual Nodes, revealing neither model subsumes the other's capabilities.
Findings
Neither model is a uniform-universal approximator.
The models' expressivity does not subsume each other.
Experimental results show mixed practical performance.
Abstract
Graph Transformers (GTs) such as SAN and GPS are graph processing models that combine Message-Passing GNNs (MPGNNs) with global Self-Attention. They were shown to be universal function approximators, with two reservations: 1. The initial node features must be augmented with certain positional encodings. 2. The approximation is non-uniform: Graphs of different sizes may require a different approximating network. We first clarify that this form of universality is not unique to GTs: Using the same positional encodings, also pure MPGNNs and even 2-layer MLPs are non-uniform universal approximators. We then consider uniform expressivity: The target function is to be approximated by a single network for graphs of all sizes. There, we compare GTs to the more efficient MPGNN + Virtual Node architecture. The essential difference between the two model definitions is in their global computation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Bayesian Modeling and Causal Inference
MethodsGoal-Driven Tree-Structured Neural Model · Greedy Policy Search
