Orthogonalising gradients to speed up neural network optimisation
Mark Tuddenham, Adam Pr\"ugel-Bennett, Jonathan Hare

TL;DR
This paper introduces a gradient orthogonalisation technique to accelerate neural network training by promoting diverse representations, leading to faster convergence on datasets like ImageNet and CIFAR-10 without sacrificing accuracy.
Contribution
The paper proposes a novel gradient orthogonalisation method that speeds up neural network optimisation while maintaining flexible weights and improving training efficiency.
Findings
Significant reduction in training time on ImageNet and CIFAR-10.
Speed-up observed in semi-supervised learning with BarlowTwins.
Achieved comparable or better accuracy without fine-tuning.
Abstract
The optimisation of neural networks can be sped up by orthogonalising the gradients before the optimisation step, ensuring the diversification of the learned representations. We orthogonalise the gradients of the layer's components/filters with respect to each other to separate out the intermediate representations. Our method of orthogonalisation allows the weights to be used more flexibly, in contrast to restricting the weights to an orthogonalised sub-space. We tested this method on ImageNet and CIFAR-10 resulting in a large decrease in learning time, and also obtain a speed-up on the semi-supervised learning BarlowTwins. We obtain similar accuracy to SGD without fine-tuning and better accuracy for na\"ively chosen hyper-parameters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Medical Image Segmentation Techniques · Domain Adaptation and Few-Shot Learning
MethodsStochastic Gradient Descent
