Orthogonalising gradients to speed up neural network optimisation

Mark Tuddenham; Adam Pr\"ugel-Bennett; Jonathan Hare

arXiv:2202.07052·cs.LG·February 16, 2022

Orthogonalising gradients to speed up neural network optimisation

Mark Tuddenham, Adam Pr\"ugel-Bennett, Jonathan Hare

PDF

Open Access 1 Repo

TL;DR

This paper introduces a gradient orthogonalisation technique to accelerate neural network training by promoting diverse representations, leading to faster convergence on datasets like ImageNet and CIFAR-10 without sacrificing accuracy.

Contribution

The paper proposes a novel gradient orthogonalisation method that speeds up neural network optimisation while maintaining flexible weights and improving training efficiency.

Findings

01

Significant reduction in training time on ImageNet and CIFAR-10.

02

Speed-up observed in semi-supervised learning with BarlowTwins.

03

Achieved comparable or better accuracy without fine-tuning.

Abstract

The optimisation of neural networks can be sped up by orthogonalising the gradients before the optimisation step, ensuring the diversification of the learned representations. We orthogonalise the gradients of the layer's components/filters with respect to each other to separate out the intermediate representations. Our method of orthogonalisation allows the weights to be used more flexibly, in contrast to restricting the weights to an orthogonalised sub-space. We tested this method on ImageNet and CIFAR-10 resulting in a large decrease in learning time, and also obtain a speed-up on the semi-supervised learning BarlowTwins. We obtain similar accuracy to SGD without fine-tuning and better accuracy for na\"ively chosen hyper-parameters.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MarkTuddenham/Orthogonal-Optimisers
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Medical Image Segmentation Techniques · Domain Adaptation and Few-Shot Learning

MethodsStochastic Gradient Descent