Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent
Zhiyuan Li, Tianhao Wang, JasonD. Lee, Sanjeev Arora

TL;DR
This paper characterizes how gradient descent on reparametrized models is equivalent to mirror descent, using the concept of commuting parametrization, unifying previous results and providing a comprehensive theoretical framework.
Contribution
It introduces the notion of commuting parametrization, showing the equivalence between gradient flow and mirror descent under this framework, and connects these concepts via Nash's embedding theorem.
Findings
Gradient flow with commuting parametrization equals mirror descent.
Any Legendre function's mirror descent corresponds to a gradient flow with a related parametrization.
The results unify previous findings on implicit bias in overparametrized models.
Abstract
As part of the effort to understand implicit bias of gradient descent in overparametrized models, several results have shown how the training trajectory on the overparametrized model can be understood as mirror descent on a different objective. The main result here is a characterization of this phenomenon under a notion termed commuting parametrization, which encompasses all the previous results in this setting. It is shown that gradient flow with any commuting parametrization is equivalent to continuous mirror descent with a related Legendre function. Conversely, continuous mirror descent with any Legendre function can be viewed as gradient flow with a related commuting parametrization. The latter result relies upon Nash's embedding theorem.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Gaussian Processes and Bayesian Inference
