Regularizing Deep Neural Networks with Stochastic Estimators of Hessian   Trace

Yucong Liu; Shixing Yu; Tong Lin

arXiv:2208.05924·cs.LG·February 23, 2023

Regularizing Deep Neural Networks with Stochastic Estimators of Hessian Trace

Yucong Liu, Shixing Yu, Tong Lin

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new regularization technique for deep neural networks that penalizes the trace of the Hessian matrix, improving generalization and flat minima discovery by using an efficient stochastic estimator.

Contribution

The paper proposes a novel Hessian trace regularizer for deep neural networks, utilizing Hutchinson's estimator with dropout for efficient computation, outperforming existing regularizers.

Findings

01

Outperforms existing regularizers and data augmentation methods

02

Enhances generalization by promoting flat minima

03

Efficient Hessian trace estimation with dropout

Abstract

In this paper, we develop a novel regularization method for deep neural networks by penalizing the trace of Hessian. This regularizer is motivated by a recent guarantee bound of the generalization error. We explain its benefits in finding flat minima and avoiding Lyapunov stability in dynamical systems. We adopt the Hutchinson method as a classical unbiased estimator for the trace of a matrix and further accelerate its calculation using a dropout scheme. Experiments demonstrate that our method outperforms existing regularizers and data augmentation methods, such as Jacobian, Confidence Penalty, Label Smoothing, Cutout, and Mixup.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iclrsubmission1596/regularizing-deep-neural-networks-with-stochastic-estimators-of-hessian-trace
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Stochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques

MethodsLabel Smoothing · Mixup · Cutout · Dropout