Dyna: A Method of Momentum for Stochastic Optimization

Zhidong Han

arXiv:1805.04933·cs.LG·May 15, 2018·1 cites

Dyna: A Method of Momentum for Stochastic Optimization

Zhidong Han

PDF

Open Access

TL;DR

Dyna introduces a momentum-based stochastic optimization method inspired by Newtonian dynamics, employing adaptive stepsizes for each neural network layer to improve convergence efficiency.

Contribution

It presents a novel momentum gradient descent algorithm with adaptive stepsizes based on physical principles, enhancing convergence in stochastic optimization.

Findings

01

Preliminary results show improved convergence.

02

Algorithm is computationally efficient.

03

Adaptive stepsizes enhance optimization performance.

Abstract

An algorithm is presented for momentum gradient descent optimization based on the first-order differential equation of the Newtonian dynamics. The fictitious mass is introduced to the dynamics of momentum for regularizing the adaptive stepsize of each individual parameter. The dynamic relaxation is adapted for stochastic optimization of nonlinear objective functions through an explicit time integration with varying damping ratio. The adaptive stepsize is optimized for each individual neural network layer based on the number of inputs. The adaptive stepsize for every parameter over the entire neural network is uniformly optimized with one upper bound, independent of sparsity, for better overall convergence rate. The numerical implementation of the algorithm is similar to the Adam Optimizer, possessing computational efficiency, similar memory requirements, etc. There are three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Model Reduction and Neural Networks

MethodsAdam