FSW-GNN: A Bi-Lipschitz WL-Equivalent Graph Neural Network
Yonatan Sverdlov, Yair Davidson, Nadav Dym, Tal Amir

TL;DR
This paper introduces FSW-GNN, a novel bi-Lipschitz graph neural network that enhances graph separation capabilities and improves accuracy in long-range tasks by addressing limitations of standard WL-equivalent MPNNs.
Contribution
It presents the first fully bi-Lipschitz MPNN, improving graph separation and long-range task performance over existing WL-equivalent models.
Findings
Competitive with standard MPNNs on several tasks
Much more accurate in long-range tasks
Avoids oversmoothing and oversquashing
Abstract
Famously, the ability of Message Passing Neural Networks (MPNN) to distinguish between graphs is limited to graphs separable by the Weisfeiler-Lemann (WL) graph isomorphism test, and the strongest MPNNs, in terms of separation power, are WL-equivalent. However, it was demonstrated that the quality of separation provided by standard WL-equivalent MPNN can be very low, resulting in WL-separable graphs being mapped to very similar, hardly distinguishable outputs. This phenomenon can be explained by the recent observation that standard MPNNs are not lower-Lipschitz. This paper addresses this issue by introducing FSW-GNN, the first MPNN that is fully bi-Lipschitz with respect to standard WL-equivalent graph metrics. Empirically, we show that our MPNN is competitive with standard MPNNs for several graph learning tasks and is far more accurate in long-range tasks, due to its ability to avoid…
Peer Reviews
Decision·Submitted to ICLR 2025
- The authors did an exceptionally good job in presenting and structuring the paper, which I enjoyed reading. The way the authors introduce the related work throughout the paper is very nice and detailed, as it gives a clear message to the readers about the contributions and the relations with previous works. - FSW-GNN is one of the first GNNs to offer bi-Lipschitz guarantees with respect to two significant WL-metrics: the DS metric and Tree Mover’s Distance. - The experimental results are pro
- My main concern with this paper is its close similarity to SortMPNN, as many ideas of the paper, such as the use of sorted message aggregation, are directly inspired by that work. While FSW-GNN introduces bi-Lipschitz guarantees, SortMPNN already established a similar approach to achieving Lipschitz properties in expectation. Furthermore, in the experimental results, SortMPNN actually outperforms FSW-GNN. The authors also did not provide a comparison with SortMPNN on some of the other datasets
1. The paper introduces the FSW-GNN, a novel graph neural network (GNN) model that achieves bi-Lipschitz continuity, which is a significant advancement over traditional MPNNs that lack such properties. 2. Empirical evaluations show that FSW-GNN performs competitively on standard graph tasks and excels in long-range tasks.
1. The FSW-GNN requires more complex operations, potentially increasing runtime compared to simpler GNN architectures like GCN and GIN. 2. FSW-GNN’s runtime is considerably higher for large datasets, which might limit its application in highly scalable scenarios. 3. While the paper provides proof for the bi-Lipschitz properties, it relies on empirical evidence and some conjecture to suggest that the model’s stability holds as depth increases.
1. Bi-Lipchitzness is an interesting idea to understand the expressive power of MPNNs. 2. It is quite interesting to see that FSW-GNN is particularly strong for long-range tasks, meaning that enhancing Bi-Lipchitzness is particularly effective for long-range tasks, where standard GNNs fall short of.
Since bi-Lipschitz continuity is stated for graph embeddings, it is less clear from a theoretical perspective in what way it is connected with node embeddings and node-level tasks. In particular, the long-range tasks considered are node-level tasks. The intuition stated in line 216-236 is also based on the graph level: I can see that "deep GNNs are bad for graph level tasks, and as a result through improving bi-Lipschitz continuity, FSW-GNN makes it less bad for graph level tasks", but it is har
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Neural Networks and Applications · Brain Tumor Detection and Classification
