Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to   Improve Generalization

Zeke Xie; Li Yuan; Zhanxing Zhu; and Masashi Sugiyama

arXiv:2103.17182·cs.LG·August 31, 2022·6 cites

Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization

Zeke Xie, Li Yuan, Zhanxing Zhu, and Masashi Sugiyama

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Positive-Negative Momentum (PNM), a novel optimizer component that manipulates stochastic gradient noise to enhance deep learning generalization, with theoretical guarantees and empirical improvements over traditional methods.

Contribution

PNM is a new method that explicitly controls stochastic gradient noise by maintaining two momentum terms, improving generalization without increasing computational costs.

Findings

01

PNM outperforms traditional momentum in deep learning tasks.

02

Theoretical proof of convergence and generalization benefits.

03

Empirical results show significant accuracy improvements.

Abstract

It is well-known that stochastic gradient noise (SGN) acts as implicit regularization for deep learning and is essentially important for both optimization and generalization of deep networks. Some works attempted to artificially simulate SGN by injecting random noise to improve deep learning. However, it turned out that the injected simple random noise cannot work as well as SGN, which is anisotropic and parameter-dependent. For simulating SGN at low computational costs and without changing the learning rate or batch size, we propose the Positive-Negative Momentum (PNM) approach that is a powerful alternative to conventional Momentum in classic optimizers. The introduced PNM method maintains two approximate independent momentum terms. Then, we can control the magnitude of SGN explicitly by adjusting the momentum difference. We theoretically prove the convergence guarantee and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zeke-xie/Positive-Negative-Momentum
pytorchOfficial

Videos

Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Machine Learning and ELM

MethodsSGD with Momentum · Stochastic Gradient Descent · Adam