SGD with Partial Hessian for Deep Neural Networks Optimization

Ying Sun; Hongwei Yong; Lei Zhang

arXiv:2403.02681·cs.LG·March 6, 2024·1 cites

SGD with Partial Hessian for Deep Neural Networks Optimization

Ying Sun, Hongwei Yong, Lei Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces SGD with Partial Hessian (SGD-PH), a novel optimizer combining second-order channel-wise Hessian information with first-order SGD to improve deep neural network training stability and performance.

Contribution

The paper proposes a new compound optimizer, SGD-PH, that accurately extracts partial Hessian matrices for channel-wise parameters, enhancing optimization in deep neural networks.

Findings

01

SGD-PH outperforms traditional optimizers on image classification tasks.

02

Partial Hessian information improves convergence stability.

03

The method maintains good generalization performance.

Abstract

Due to the effectiveness of second-order algorithms in solving classical optimization problems, designing second-order optimizers to train deep neural networks (DNNs) has attracted much research interest in recent years. However, because of the very high dimension of intermediate features in DNNs, it is difficult to directly compute and store the Hessian matrix for network optimization. Most of the previous second-order methods approximate the Hessian information imprecisely, resulting in unstable performance. In this work, we propose a compound optimizer, which is a combination of a second-order optimizer with a precise partial Hessian matrix for updating channel-wise parameters and the first-order stochastic gradient descent (SGD) optimizer for updating the other parameters. We show that the associated Hessian matrices of channel-wise parameters are diagonal and can be extracted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

myingysun/sgdph
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques

MethodsStochastic Gradient Descent