NG+ : A Multi-Step Matrix-Product Natural Gradient Method for Deep   Learning

Minghan Yang; Dong Xu; Qiwen Cui; Zaiwen Wen; Pengxiang Xu

arXiv:2106.07454·math.OC·June 15, 2021·1 cites

NG+ : A Multi-Step Matrix-Product Natural Gradient Method for Deep Learning

Minghan Yang, Dong Xu, Qiwen Cui, Zaiwen Wen, Pengxiang Xu

PDF

Open Access 1 Repo

TL;DR

NG+ introduces a novel multi-step matrix-product natural gradient method that efficiently approximates second-order information for deep learning, demonstrating competitive performance across various tasks.

Contribution

The paper proposes NG+, a new second-order optimization method using a generalized Fisher information matrix in matrix form, with controlled computational cost and theoretical convergence guarantees.

Findings

01

NG+ achieves competitive results on image classification, quantum chemistry, translation, and recommendation tasks.

02

The method maintains a fixed GFIM over multiple steps, reducing computational overhead.

03

Global convergence and regret bounds are established under mild conditions.

Abstract

In this paper, a novel second-order method called NG+ is proposed. By following the rule ``the shape of the gradient equals the shape of the parameter", we define a generalized fisher information matrix (GFIM) using the products of gradients in the matrix form rather than the traditional vectorization. Then, our generalized natural gradient direction is simply the inverse of the GFIM multiplies the gradient in the matrix form. Moreover, the GFIM and its inverse keeps the same for multiple steps so that the computational cost can be controlled and is comparable with the first-order methods. A global convergence is established under some mild conditions and a regret bound is also given for the online learning setting. Numerical results on image classification with ResNet50, quantum chemistry modeling with Schnet, neural machine translation with Transformer and recommendation system with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yangorwell/NGPlus
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Sparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Label Smoothing · Residual Connection · Dense Connections