Improving Generalization and Convergence by Enhancing Implicit Regularization
Mingze Wang, Jinbo Wang, Haotian He, Zilin Wang, Guanhua Huang, Feiyu, Xiong, Zhiyu Li, Weinan E, Lei Wu

TL;DR
This paper introduces the Implicit Regularization Enhancement (IRE) framework that accelerates finding flat minima in deep learning, leading to better generalization and faster convergence without significant computational costs.
Contribution
The paper proposes a novel IRE framework that decouples flat and sharp directions, improving generalization and convergence in deep learning models.
Findings
IRE improves generalization on CIFAR and ImageNet datasets.
IRE achieves 2x speed-up in Llama model pre-training.
Theoretical guarantees show accelerated convergence to flat minima.
Abstract
In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence. Specifically, IRE decouples the dynamics of flat and sharp directions, which boosts the sharpness reduction along flat directions while maintaining the training stability in sharp directions. We show that IRE can be practically incorporated with {\em generic base optimizers} without introducing significant computational overload. Experiments show that IRE consistently improves the generalization performance for image classification tasks across a variety of benchmark datasets (CIFAR-10/100, ImageNet) and models (ResNets and ViTs). Surprisingly, IRE also achieves a {\em speed-up} compared to AdamW in the pre-training of Llama models (of sizes ranging from 60M to 229M) on datasets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Matrix Theory and Algorithms · Topology Optimization in Engineering
MethodsSharpness-Aware Minimization · Balanced Selection · AdamW · LLaMA
