Lipschitz Normalization for Self-Attention Layers with Application to   Graph Neural Networks

George Dasoulas; Kevin Scaman; Aladin Virmaux

arXiv:2103.04886·cs.LG·September 14, 2021·5 cites

Lipschitz Normalization for Self-Attention Layers with Application to Graph Neural Networks

George Dasoulas, Kevin Scaman, Aladin Virmaux

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces LipschitzNorm, a normalization technique for self-attention layers that enforces Lipschitz continuity, significantly improving the performance of deep graph neural networks by preventing gradient explosion and enabling deeper architectures.

Contribution

The paper proposes LipschitzNorm, a simple, parameter-free normalization method for self-attention modules, enhancing deep graph neural network training and performance.

Findings

01

LipschitzNorm improves deep GAT and Graph Transformer performance.

02

Deep GAT with LipschitzNorm achieves state-of-the-art results.

03

Normalization prevents gradient explosion in deep attention models.

Abstract

Attention based neural networks are state of the art in a large range of applications. However, their performance tends to degrade when the number of layers increases. In this work, we show that enforcing Lipschitz continuity by normalizing the attention scores can significantly improve the performance of deep attention models. First, we show that, for deep graph attention networks (GAT), gradient explosion appears during training, leading to poor performance of gradient-based training algorithms. To address this issue, we derive a theoretical analysis of the Lipschitz continuity of attention modules and introduce LipschitzNorm, a simple and parameter-free normalization for self-attention mechanisms that enforces the model to be Lipschitz continuous. We then apply LipschitzNorm to GAT and Graph Transformers and show that their performance is substantially improved in the deep setting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gdasoulas/lipschitznorm
pytorchOfficial

Videos

Lipschitz normalization for self-attention layers with application to graph neural networks· slideslive

Taxonomy

TopicsAdvanced Graph Neural Networks · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning

MethodsGraph Attention Network