On the benefits of non-linear weight updates

Paul Norridge

arXiv:2207.12505·cs.LG·July 27, 2022

On the benefits of non-linear weight updates

Paul Norridge

PDF

Open Access

TL;DR

This paper introduces a non-linear gradient update method that explicitly balances weight changes to enhance the Signal-to-Noise Ratio, leading to improved generalization in deep neural networks.

Contribution

It proposes a novel non-linear transformation of gradients before updates, explicitly balancing weight changes to improve SNR and generalization performance.

Findings

01

Improved generalization performance across various tasks.

02

Explicit non-linear gradient transformation enhances optimizer effectiveness.

03

Better SNR optimization leads to more robust training outcomes.

Abstract

Recent work has suggested that the generalisation performance of a DNN is related to the extent to which the Signal-to-Noise Ratio is optimised at each of the nodes. In contrast, Gradient Descent methods do not always lead to SNR-optimal weight configurations. One way to improve SNR performance is to suppress large weight updates and amplify small weight updates. Such balancing is already implicit in some common optimizers, but we propose an approach that makes this explicit. The method applies a non-linear function to gradients prior to making DNN parameter updates. We investigate the performance with such non-linear approaches. The result is an adaptation to existing optimizers that improves performance for many problem types.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Neural Networks and Applications · Stochastic Gradient Optimization Techniques