Stochastic Mirror Descent in Average Ensemble Models

Taylan Kargin; Fariborz Salehi; Babak Hassibi

arXiv:2210.15323·cs.LG·October 28, 2022·1 cites

Stochastic Mirror Descent in Average Ensemble Models

Taylan Kargin, Fariborz Salehi, Babak Hassibi

PDF

Open Access

TL;DR

This paper investigates the behavior of stochastic mirror descent in mean-field ensemble models, deriving a PDE for the parameter distribution evolution and analyzing the influence of mirror potentials on training dynamics and performance.

Contribution

It generalizes previous results for SGD to SMD, deriving a PDE for the continuous limit and analyzing the impact of mirror potentials on training dynamics.

Findings

01

Derived a nonlinear PDE for the distribution evolution in large networks

02

Showed the mirror potential influences training via a Riemannian metric

03

Numerical simulations illustrate the effect of mirror potentials on classification performance

Abstract

The stochastic mirror descent (SMD) algorithm is a general class of training algorithms, which includes the celebrated stochastic gradient descent (SGD), as a special case. It utilizes a mirror potential to influence the implicit bias of the training algorithm. In this paper we explore the performance of the SMD iterates on mean-field ensemble models. Our results generalize earlier ones obtained for SGD on such models. The evolution of the distribution of parameters is mapped to a continuous time process in the space of probability distributions. Our main result gives a nonlinear partial differential equation to which the continuous time process converges in the asymptotic regime of large networks. The impact of the mirror potential appears through a multiplicative term that is equal to the inverse of its Hessian and which can be interpreted as defining a gradient flow over an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Neural Networks and Applications

MethodsStochastic Gradient Descent