On Over-Squashing in Message Passing Neural Networks: The Impact of Width, Depth, and Topology
Francesco Di Giovanni, Lorenzo Giusti, Federico Barbero, Giulia Luise,, Pietro Lio', Michael Bronstein

TL;DR
This paper provides a theoretical analysis of over-squashing in Message Passing Neural Networks, revealing how width, depth, and topology influence this phenomenon and offering insights into mitigation strategies like graph rewiring.
Contribution
It offers a unified theoretical framework explaining over-squashing, highlighting the roles of network width, depth, and graph topology, and justifies graph rewiring methods.
Findings
Width can reduce over-squashing but increases sensitivity.
Increasing depth does not mitigate over-squashing due to vanishing gradients.
Graph topology, especially high commute time, is the dominant factor.
Abstract
Message Passing Neural Networks (MPNNs) are instances of Graph Neural Networks that leverage the graph to send messages over the edges. This inductive bias leads to a phenomenon known as over-squashing, where a node feature is insensitive to information contained at distant nodes. Despite recent methods introduced to mitigate this issue, an understanding of the causes for over-squashing and of possible solutions are lacking. In this theoretical work, we prove that: (i) Neural network width can mitigate over-squashing, but at the cost of making the whole network more sensitive; (ii) Conversely, depth cannot help mitigate over-squashing: increasing the number of layers leads to over-squashing being dominated by vanishing gradients; (iii) The graph topology plays the greatest role, since over-squashing occurs between nodes at high commute (access) time. Our analysis provides a unified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices · Stochastic Gradient Optimization Techniques
