On the Stability of Expressive Positional Encodings for Graphs
Yinan Huang, William Lu, Joshua Robinson, Yu Yang, Muhan Zhang,, Stefanie Jegelka, Pan Li

TL;DR
This paper introduces SPE, a stable and expressive architecture for positional encodings in graph neural networks, addressing eigenvector instability issues and improving generalization in graph tasks.
Contribution
SPE is the first architecture that guarantees stability and universal expressiveness for eigenvector-based positional encodings in graphs.
Findings
SPE outperforms existing methods in molecular property prediction.
SPE demonstrates improved out-of-distribution generalization.
SPE is provably stable and as expressive as current approaches.
Abstract
Designing effective positional encodings for graphs is key to building powerful graph transformers and enhancing message-passing graph neural networks. Although widespread, using Laplacian eigenvectors as positional encodings faces two fundamental challenges: (1) \emph{Non-uniqueness}: there are many different eigendecompositions of the same Laplacian, and (2) \emph{Instability}: small perturbations to the Laplacian could result in completely different eigenspaces, leading to unpredictable changes in positional encoding. Despite many attempts to address non-uniqueness, most methods overlook stability, leading to poor generalization on unseen graph structures. We identify the cause of instability to be a ``hard partition'' of eigenspaces. Hence, we introduce Stable and Expressive Positional Encodings (SPE), an architecture for processing eigenvectors that uses eigenvalues to ``softly…
Peer Reviews
Decision·ICLR 2024 poster
Extremely well written and easy to follow. Figure 1 is nice! The proposed model SPE (Eq. 2) is a simple generalization extension of basis-net. It is very interesting to see that such a straightforward modification yields such strong (and non-trivial) theoretical results (Theorem 1, etc.). Empirical performance is very convincing, e.g. for Zinc and the OOD tests. ------ during rebuttal ------ as reviewers addressed my concerns and in fact added clarifications on runtime and more importantly on
The achieved theoretical and empirical results, while very interesting, seem somewhat incremental compared to Wang et al (ICLR 2022) and Lim&Robinson et al (ICLR 2023). The authors should discuss the differences more clearly. In particular: * The reason for Hölder continuity ($c\neq1$) is not fully clear. E.g. does PEG / standard basis-net already already satisfy your stability criterion Def 3.1 and / or Assumption 3.1.? If yes, what is the conceptual / theoretical benefit of your proposed archi
- The idea to address the stability of LapPE is novel, with SPE being a universal basis invariant architecture. - The motivation is well established, the difference to other related works is concise, the propositions are described well and the strength of SPE as a universal basis invariant architecture is presented thoroughly. - The experiments show the improvement in generalisation for SPE and its improved capabilities in recognising substructures, which are interesting outcomes of the archit
- The novelty of the method itself is partially limited as the idea to use a weighted correlation over the eigenvectors closely resembles the correlation used in BasisNet. - The experiments are limited. The performance of SPE in Table 1 is sub-par, and details of the experimental results in Figure 2 are unclear and the hyperparameters seem not to be reported, which makes it hard to reproduce the experiments. - The point regarding the trade-off between expressivity and generalisation is unclear
- S1. I like the design of the experiment in that the authors validate the stability of SPEs from an aspect of out-of-distribution generalization. - S2. The authors target at robustness/instability and generalization of PEs, which is novel in the literature. - S3. The paper is overall well written.
> W1. The instability of prior method (i.e., the so-called hard partition method) is not proved. The authors point out under Eq.2 that **hard partition** is induced when $\[\phi_{\ell}(\boldsymbol{\lambda})\]_j=\mathbb{1}$(other places in $\phi(\cdot)$ are zeros), and then $\boldsymbol{V}\text{diag}(\phi_{\ell}(\boldsymbol{\lambda}))\boldsymbol{V}^{T}$ is the $\ell$-th subspace. The problem is that, if we set $\\{ \phi_i \\}_{i=1}^{m}$ that induce the hard partitions, then they are **c
Code & Models
Videos
Taxonomy
TopicsMachine Learning in Materials Science · Advanced Graph Neural Networks · Advanced Memory and Neural Computing
