# A dual enhanced stochastic gradient descent method with dynamic momentum and step size adaptation for improved optimization performance

**Authors:** Mohamed A. Mokhtar, Mohamed Fathy, Yasser A. Dahab, Emad A. Sayed

PMC · DOI: 10.1038/s41598-025-24689-y · 2025-11-18

## TL;DR

This paper introduces a new optimization algorithm called DESGD that improves training performance in complex machine learning landscapes.

## Contribution

The novel dual enhanced SGD method dynamically adapts both momentum and step size within the same update rules.

## Key findings

- DESGD outperforms SGDM and Adam in optimization test functions with fewer iterations and less CPU time.
- On the MNIST dataset, DESGD achieves higher accuracy and lower test loss across most batch sizes.
- The method offers a favorable cost-to-performance ratio with marginal computational overhead.

## Abstract

In modern machine learning, optimization algorithms are crucial; they steer the training process by skillfully navigating through complex, high-dimensional loss landscapes. Among these, stochastic gradient descent with momentum (SGDM) is widely adopted for its ability to accelerate convergence in shallow regions. However, SGDM struggles in challenging optimization landscapes, where narrow, curved valleys can lead to oscillations and slow progress. This paper introduces dual enhanced SGD (DESGD), which addresses these limitations by dynamically adapting both momentum and step size on the same update rules of SGDM. In two optimization test functions, the Rosenbrock and Sum Square functions, the suggested optimizer typically performs better than SGDM and Adam. For example, it accomplishes comparable errors while achieving up to 81–95% fewer iterations and 66–91% less CPU time than SGDM and 67–78% fewer iterations with 62–70% quicker runtimes than Adam. On the MNIST dataset, the proposed optimizer achieved the highest accuracies and lowest test losses across the majority of batch sizes. Compared to SGDM, they consistently improved accuracy by about 1–2%, while performing on par with or slightly better than Adam in accuracy and error. Although SGDM remained the fastest per-step optimizer, our method’s computational cost is aligned with that of other adaptive optimizers like Adam. This marginal increase in per-iteration overhead is decisively justified by the substantial gains in model accuracy and reduction in training loss, demonstrating a favorable cost-to-performance ratio. The results demonstrate that DESGD is a promising practical optimizer to handle scenarios demanding stability in challenging landscapes.

## Full-text entities

- **Chemicals:** Adadelta (-)
- **Species:** Ovis aries (domestic sheep, species) [taxon 9940], Homo sapiens (human, species) [taxon 9606], Felis catus (cat, species) [taxon 9685]

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12627789/full.md

---
Source: https://tomesphere.com/paper/PMC12627789