Implicit regularisation in stochastic gradient descent: from single-objective to two-player games
Mihaela Rosca, Marc Peter Deisenroth

TL;DR
This paper introduces a novel approach using backward error analysis to construct gradient-compatible continuous-time flows, revealing new implicit regularisation effects in stochastic gradient descent and two-player games.
Contribution
It develops a new method to utilize BEA for constructing gradient-representable flows, enabling discovery of previously unknown implicit regularisers in complex settings.
Findings
Identified implicit regularisation effects from multiple SGD steps with exact data batches.
Extended analysis to differentiable two-player games.
Provided a method to construct flows with gradient vector fields using BEA.
Abstract
Recent years have seen many insights on deep learning optimisation being brought forward by finding implicit regularisation effects of commonly used gradient-based optimisers. Understanding implicit regularisation can not only shed light on optimisation dynamics, but it can also be used to improve performance and stability across problem domains, from supervised learning to two-player games such as Generative Adversarial Networks. An avenue for finding such implicit regularisation effects has been quantifying the discretisation errors of discrete optimisers via continuous-time flows constructed by backward error analysis (BEA). The current usage of BEA is not without limitations, since not all the vector fields of continuous-time flows obtained using BEA can be written as a gradient, hindering the construction of modified losses revealing implicit regularisers. In this work, we provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Stochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research
