Perturbated Gradients Updating within Unit Space for Deep Learning
Ching-Hsun. Tseng, Liu-Hsueh. Cheng, Shin-Jye. Lee, Xiaojun Zeng

TL;DR
This paper introduces PUGD, a new optimizer for deep learning that updates in unit space to promote flat minima and improve accuracy, demonstrated on image classification tasks.
Contribution
The paper proposes PUGD, a novel optimizer with normalized gradient updates in unit space, enhancing model performance and stability in deep learning.
Findings
PUGD achieves state-of-the-art accuracy on Tiny ImageNet.
PUGD provides competitive results on CIFAR-10 and CIFAR-100.
PUGD ensures locally bounded and controlled updates.
Abstract
In deep learning, optimization plays a vital role. By focusing on image classification, this work investigates the pros and cons of the widely used optimizers, and proposes a new optimizer: Perturbated Unit Gradient Descent (PUGD) algorithm with extending normalized gradient operation in tensor within perturbation to update in unit space. Via a set of experiments and analyses, we show that PUGD is locally bounded updating, which means the updating from time to time is controlled. On the other hand, PUGD can push models to a flat minimum, where the error remains approximately constant, not only because of the nature of avoiding stationary points in gradient normalization but also by scanning sharpness in the unit ball. From a series of rigorous experiments, PUGD helps models to gain a state-of-the-art Top-1 accuracy in Tiny ImageNet and competitive performances in CIFAR- {10, 100}. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Tensor decomposition and applications · Sparse and Compressive Sensing Techniques
MethodsGradient Normalization · Sharpness-Aware Minimization
