Understanding Gradient Regularization in Deep Learning: Efficient   Finite-Difference Computation and Implicit Bias

Ryo Karakida; Tomoumi Takase; Tomohiro Hayase; Kazuki Osawa

arXiv:2210.02720·cs.LG·February 6, 2023·1 cites

Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias

Ryo Karakida, Tomoumi Takase, Tomohiro Hayase, Kazuki Osawa

PDF

Open Access 1 Video

TL;DR

This paper introduces an efficient finite-difference method for gradient regularization in deep learning, demonstrating improved computational efficiency and generalization, with theoretical insights into its implicit bias and connections to other algorithms.

Contribution

It proposes a novel finite-difference computation method for gradient regularization that enhances efficiency and generalization, supported by theoretical analysis and connections to existing algorithms.

Findings

01

Finite-difference computation reduces GR's computational cost.

02

Finite-difference GR improves generalization performance.

03

Finite-difference GR has a desirable implicit bias in linear models.

Abstract

Gradient regularization (GR) is a method that penalizes the gradient norm of the training loss during training. While some studies have reported that GR can improve generalization performance, little attention has been paid to it from the algorithmic perspective, that is, the algorithms of GR that efficiently improve the performance. In this study, we first reveal that a specific finite-difference computation, composed of both gradient ascent and descent steps, reduces the computational cost of GR. Next, we show that the finite-difference computation also works better in the sense of generalization performance. We theoretically analyze a solvable model, a diagonal linear network, and clarify that GR has a desirable implicit bias to so-called rich regime and finite-difference computation strengthens this bias. Furthermore, finite-difference GR is closely related to some other algorithms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM

MethodsSharpness-Aware Minimization