Dynamically Weighted Momentum with Adaptive Step Sizes for Efficient Deep Network Training

Zhifeng Wang; Longlong Li; Chunyan Zeng

arXiv:2510.25042·cs.LG·October 30, 2025

Dynamically Weighted Momentum with Adaptive Step Sizes for Efficient Deep Network Training

Zhifeng Wang, Longlong Li, Chunyan Zeng

PDF

TL;DR

This paper introduces DWMGrad, a novel optimization algorithm that adaptively adjusts momentum and learning rates based on historical data, improving convergence speed and accuracy in deep network training.

Contribution

The paper presents DWMGrad, an adaptive optimizer that dynamically updates momentum and step sizes, addressing limitations of existing methods in complex, non-convex training scenarios.

Findings

01

Faster convergence rates demonstrated across multiple experiments.

02

Higher accuracy achieved compared to traditional optimizers.

03

Effective adaptation to changing training environments.

Abstract

Within the current sphere of deep learning research, despite the extensive application of optimization algorithms such as Stochastic Gradient Descent (SGD) and Adaptive Moment Estimation (Adam), there remains a pronounced inadequacy in their capability to address fluctuations in learning efficiency, meet the demands of complex models, and tackle non-convex optimization issues. These challenges primarily arise from the algorithms' limitations in handling complex data structures and models, for instance, difficulties in selecting an appropriate learning rate, avoiding local optima, and navigating through high-dimensional spaces. To address these issues, this paper introduces a novel optimization algorithm named DWMGrad. This algorithm, building on the foundations of traditional methods, incorporates a dynamic guidance mechanism reliant on historical data to dynamically update momentum and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.