AsymptoticNG: A regularized natural gradient optimization algorithm with   look-ahead strategy

Zedong Tang; Fenlong Jiang; Junke Song; Maoguo Gong; Hao Li; Fan Yu,; Zidong Wang; Min Wang

arXiv:2012.13077·cs.LG·January 19, 2021

AsymptoticNG: A regularized natural gradient optimization algorithm with look-ahead strategy

Zedong Tang, Fenlong Jiang, Junke Song, Maoguo Gong, Hao Li, Fan Yu,, Zidong Wang, Min Wang

PDF

Open Access

TL;DR

This paper introduces AsymptoticNG, a regularized natural gradient optimizer with a look-ahead strategy that combines NG and Euclidean gradients, improving stability and generalization in training deep models.

Contribution

The paper proposes a novel optimizer, AsymptoticNG, which dynamically combines NG and Euclidean gradients with a look-ahead strategy to enhance stability and performance.

Findings

01

ANG updates smoothly and stably at second-order speed.

02

ANG achieves better generalization on CIFAR10 and CIFAR100.

03

The method effectively combines NG and Euclidean gradients for improved optimization.

Abstract

Optimizers that further adjust the scale of gradient, such as Adam, Natural Gradient (NG), etc., despite widely concerned and used by the community, are often found poor generalization performance, compared with Stochastic Gradient Descent (SGD). They tend to converge excellently at the beginning of training but are weak at the end. An immediate idea is to complement the strengths of these algorithms with SGD. However, a truncated replacement of optimizer often leads to a crash of the update pattern, and new algorithms often require many iterations to stabilize their search direction. Driven by this idea and to address this problem, we design and present a regularized natural gradient optimization algorithm with look-ahead strategy, named asymptotic natural gradient (ANG). According to the total iteration step, ANG dynamic assembles NG and Euclidean gradient, and updates parameters…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM

MethodsStochastic Gradient Descent · Adam