Dyna: A Method of Momentum for Stochastic Optimization
Zhidong Han

TL;DR
Dyna introduces a momentum-based stochastic optimization method inspired by Newtonian dynamics, employing adaptive stepsizes for each neural network layer to improve convergence efficiency.
Contribution
It presents a novel momentum gradient descent algorithm with adaptive stepsizes based on physical principles, enhancing convergence in stochastic optimization.
Findings
Preliminary results show improved convergence.
Algorithm is computationally efficient.
Adaptive stepsizes enhance optimization performance.
Abstract
An algorithm is presented for momentum gradient descent optimization based on the first-order differential equation of the Newtonian dynamics. The fictitious mass is introduced to the dynamics of momentum for regularizing the adaptive stepsize of each individual parameter. The dynamic relaxation is adapted for stochastic optimization of nonlinear objective functions through an explicit time integration with varying damping ratio. The adaptive stepsize is optimized for each individual neural network layer based on the number of inputs. The adaptive stepsize for every parameter over the entire neural network is uniformly optimized with one upper bound, independent of sparsity, for better overall convergence rate. The numerical implementation of the algorithm is similar to the Adam Optimizer, possessing computational efficiency, similar memory requirements, etc. There are three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Model Reduction and Neural Networks
MethodsAdam
