Stochastic Mirror Descent in Average Ensemble Models
Taylan Kargin, Fariborz Salehi, Babak Hassibi

TL;DR
This paper investigates the behavior of stochastic mirror descent in mean-field ensemble models, deriving a PDE for the parameter distribution evolution and analyzing the influence of mirror potentials on training dynamics and performance.
Contribution
It generalizes previous results for SGD to SMD, deriving a PDE for the continuous limit and analyzing the impact of mirror potentials on training dynamics.
Findings
Derived a nonlinear PDE for the distribution evolution in large networks
Showed the mirror potential influences training via a Riemannian metric
Numerical simulations illustrate the effect of mirror potentials on classification performance
Abstract
The stochastic mirror descent (SMD) algorithm is a general class of training algorithms, which includes the celebrated stochastic gradient descent (SGD), as a special case. It utilizes a mirror potential to influence the implicit bias of the training algorithm. In this paper we explore the performance of the SMD iterates on mean-field ensemble models. Our results generalize earlier ones obtained for SGD on such models. The evolution of the distribution of parameters is mapped to a continuous time process in the space of probability distributions. Our main result gives a nonlinear partial differential equation to which the continuous time process converges in the asymptotic regime of large networks. The impact of the mirror potential appears through a multiplicative term that is equal to the inverse of its Hessian and which can be interpreted as defining a gradient flow over an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Neural Networks and Applications
MethodsStochastic Gradient Descent
