Improving Generalization and Convergence by Enhancing Implicit   Regularization

Mingze Wang; Jinbo Wang; Haotian He; Zilin Wang; Guanhua Huang; Feiyu; Xiong; Zhiyu Li; Weinan E; Lei Wu

arXiv:2405.20763·cs.LG·November 4, 2024

Improving Generalization and Convergence by Enhancing Implicit Regularization

Mingze Wang, Jinbo Wang, Haotian He, Zilin Wang, Guanhua Huang, Feiyu, Xiong, Zhiyu Li, Weinan E, Lei Wu

PDF

Open Access 1 Repo

TL;DR

This paper introduces the Implicit Regularization Enhancement (IRE) framework that accelerates finding flat minima in deep learning, leading to better generalization and faster convergence without significant computational costs.

Contribution

The paper proposes a novel IRE framework that decouples flat and sharp directions, improving generalization and convergence in deep learning models.

Findings

01

IRE improves generalization on CIFAR and ImageNet datasets.

02

IRE achieves 2x speed-up in Llama model pre-training.

03

Theoretical guarantees show accelerated convergence to flat minima.

Abstract

In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence. Specifically, IRE decouples the dynamics of flat and sharp directions, which boosts the sharpness reduction along flat directions while maintaining the training stability in sharp directions. We show that IRE can be practically incorporated with {\em generic base optimizers} without introducing significant computational overload. Experiments show that IRE consistently improves the generalization performance for image classification tasks across a variety of benchmark datasets (CIFAR-10/100, ImageNet) and models (ResNets and ViTs). Surprisingly, IRE also achieves a $2 \times$ {\em speed-up} compared to AdamW in the pre-training of Llama models (of sizes ranging from 60M to 229M) on datasets…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wmz9/ire-algorithm-framework
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Matrix Theory and Algorithms · Topology Optimization in Engineering

MethodsSharpness-Aware Minimization · Balanced Selection · AdamW · LLaMA