Tensor Normal Training for Deep Learning Models

Yi Ren; Donald Goldfarb

arXiv:2106.02925·cs.LG·December 23, 2021

Tensor Normal Training for Deep Learning Models

Yi Ren, Donald Goldfarb

PDF

Open Access 1 Repo 1 Video

TL;DR

Tensor Normal Training (TNT) introduces a novel second-order optimization method for deep learning that leverages tensor normal distribution assumptions to efficiently approximate the Fisher matrix, improving training speed and generalization.

Contribution

TNT is the first method to use tensor normal distribution assumptions for efficient natural gradient approximation in deep learning training.

Findings

01

TNT outperforms first-order methods in optimization speed.

02

TNT matches the performance of state-of-the-art second-order methods.

03

TNT requires only slightly more memory and computation than first-order methods.

Abstract

Despite the predominant use of first-order methods for training deep learning models, second-order methods, and in particular, natural gradient methods, remain of interest because of their potential for accelerating training through the use of curvature information. Several methods with non-diagonal preconditioning matrices, including KFAC, Shampoo, and K-BFGS, have been proposed and shown to be effective. Based on the so-called tensor normal (TN) distribution, we propose and analyze a brand new approximate natural gradient method, Tensor Normal Training (TNT), which like Shampoo, only requires knowledge of the shape of the training parameters. By approximating the probabilistically based Fisher matrix, as opposed to the empirical Fisher matrix, our method uses the block-wise covariance of the sampling based gradient as the pre-conditioning matrix. Moreover, the assumption that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

renyiryry/tnt_neurips_2021
pytorchOfficial

Videos

Tensor Normal Training for Deep Learning Models· slideslive

Taxonomy

TopicsTensor decomposition and applications · Advanced Neural Network Applications · Model Reduction and Neural Networks

MethodsDistributed Shampoo · Transformer in Transformer