Variance Reduction in Deep Learning: More Momentum is All You Need

Lionel Tondji; Sergii Kashubin; Moustapha Cisse

arXiv:2111.11828·cs.LG·November 24, 2021

Variance Reduction in Deep Learning: More Momentum is All You Need

Lionel Tondji, Sergii Kashubin, Moustapha Cisse

PDF

Open Access

TL;DR

This paper introduces a novel multi-momentum variance reduction technique for deep learning that accelerates convergence, is robust to noise, and scalable across distributed systems, outperforming standard methods on benchmarks.

Contribution

It proposes a scalable variance reduction method combining existing optimizers with multi-momentum, tailored for deep learning datasets with clustering structure.

Findings

01

Faster convergence than vanilla methods on CIFAR and ImageNet

02

Robust to label noise in training

03

Suitable for distributed optimization environments

Abstract

Variance reduction (VR) techniques have contributed significantly to accelerating learning with massive datasets in the smooth and strongly convex setting (Schmidt et al., 2017; Johnson & Zhang, 2013; Roux et al., 2012). However, such techniques have not yet met the same success in the realm of large-scale deep learning due to various factors such as the use of data augmentation or regularization methods like dropout (Defazio & Bottou, 2019). This challenge has recently motivated the design of novel variance reduction techniques tailored explicitly for deep learning (Arnold et al., 2019; Ma & Yarats, 2018). This work is an additional step in this direction. In particular, we exploit the ubiquitous clustering structure of rich datasets used in deep learning to design a family of scalable variance reduced optimization procedures by combining existing optimizers (e.g., SGD+Momentum, Quasi…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Sparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning

MethodsDropout