The Unified Balance Theory of Second-Moment Exponential Scaling   Optimizers in Visual Tasks

Gongyue Zhang; Honghai Liu

arXiv:2405.18498·cs.LG·May 30, 2024

The Unified Balance Theory of Second-Moment Exponential Scaling Optimizers in Visual Tasks

Gongyue Zhang, Honghai Liu

PDF

Open Access

TL;DR

This paper introduces a unified framework for first-order optimizers in visual tasks using variable Second-Moment Exponential Scaling, addressing issues like gradient vanishing and dataset sparsity through a new balance theory.

Contribution

It proposes a novel unification of SGD and adaptive optimizers via variable exponential scaling, grounded in a new balance theory for optimization.

Findings

01

Different balance coefficients significantly affect training dynamics

02

The unified approach improves optimization stability

03

Experimental results confirm the theory's effectiveness

Abstract

We have identified a potential method for unifying first-order optimizers through the use of variable Second-Moment Exponential Scaling(SMES). We begin with back propagation, addressing classic phenomena such as gradient vanishing and explosion, as well as issues related to dataset sparsity, and introduce the theory of balance in optimization. Through this theory, we suggest that SGD and adaptive optimizers can be unified under a broader inference, employing variable moving exponential scaling to achieve a balanced approach within a generalized formula for first-order optimizers. We conducted tests on some classic datasets and networks to confirm the impact of different balance coefficients on the overall training process.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual perception and processing mechanisms · Data Visualization and Analytics

MethodsStochastic Gradient Descent