On the benefits of non-linear weight updates
Paul Norridge

TL;DR
This paper introduces a non-linear gradient update method that explicitly balances weight changes to enhance the Signal-to-Noise Ratio, leading to improved generalization in deep neural networks.
Contribution
It proposes a novel non-linear transformation of gradients before updates, explicitly balancing weight changes to improve SNR and generalization performance.
Findings
Improved generalization performance across various tasks.
Explicit non-linear gradient transformation enhances optimizer effectiveness.
Better SNR optimization leads to more robust training outcomes.
Abstract
Recent work has suggested that the generalisation performance of a DNN is related to the extent to which the Signal-to-Noise Ratio is optimised at each of the nodes. In contrast, Gradient Descent methods do not always lead to SNR-optimal weight configurations. One way to improve SNR performance is to suppress large weight updates and amplify small weight updates. Such balancing is already implicit in some common optimizers, but we propose an approach that makes this explicit. The method applies a non-linear function to gradients prior to making DNN parameter updates. We investigate the performance with such non-linear approaches. The result is an adaptation to existing optimizers that improves performance for many problem types.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Neural Networks and Applications · Stochastic Gradient Optimization Techniques
