Are GATs Out of Balance?
Nimrah Mustafa, Aleksandar Bojchevski, Rebekka Burkholz

TL;DR
This paper investigates the training dynamics of Graph Attention Networks (GATs), revealing why they struggle to learn effectively, especially in deeper architectures, and proposes an initialization method to improve their trainability and convergence.
Contribution
It derives a conservation law for GAT gradient flow, explaining training difficulties, and introduces a balanced initialization scheme to enhance training efficiency and depth scalability.
Findings
High parameter portions in GATs with standard init struggle to change during training.
Deeper GATs perform worse than shallow ones due to optimization issues.
Proposed initialization improves gradient propagation and speeds up training.
Abstract
While the expressive power and computational capabilities of graph neural networks (GNNs) have been theoretically studied, their optimization and learning dynamics, in general, remain largely unexplored. Our study undertakes the Graph Attention Network (GAT), a popular GNN architecture in which a node's neighborhood aggregation is weighted by parameterized attention coefficients. We derive a conservation law of GAT gradient flow dynamics, which explains why a high portion of parameters in GATs with standard initialization struggle to change during training. This effect is amplified in deeper GATs, which perform significantly worse than their shallow counterparts. To alleviate this problem, we devise an initialization scheme that balances the GAT network. Our approach i) allows more effective propagation of gradients and in turn enables trainability of deeper networks, and ii) attains a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Graph Neural Networks · Neural Networks and Applications · Stochastic Gradient Optimization Techniques
MethodsGraph Attention Network
