Optimizers Qualitatively Alter Solutions And We Should Leverage This

Razvan Pascanu; Clare Lyle; Ionut-Vlad Modoranu; Naima Elosegui Borras; Dan Alistarh; Petar Velickovic; Sarath Chandar; Soham De; James Martens

arXiv:2507.12224·cs.LG·July 17, 2025

Optimizers Qualitatively Alter Solutions And We Should Leverage This

Razvan Pascanu, Clare Lyle, Ionut-Vlad Modoranu, Naima Elosegui Borras, Dan Alistarh, Petar Velickovic, Sarath Chandar, Soham De, James Martens

PDF

Open Access

TL;DR

This paper emphasizes that optimizers in deep learning influence not only convergence speed but also the qualitative properties of solutions, advocating for research into optimizer biases to improve model outcomes.

Contribution

It highlights the importance of understanding and leveraging optimizer-induced biases to shape the qualitative properties of solutions in deep neural networks.

Findings

01

Optimizers encode inductive biases affecting solution properties.

02

The optimizer influences the effective expressivity of models.

03

Research should focus on designing optimizers to induce desired solution characteristics.

Abstract

Due to the nonlinear nature of Deep Neural Networks (DNNs), one can not guarantee convergence to a unique global minimum of the loss when using optimizers relying only on local information, such as SGD. Indeed, this was a primary source of skepticism regarding the feasibility of DNNs in the early days of the field. The past decades of progress in deep learning have revealed this skepticism to be misplaced, and a large body of empirical evidence shows that sufficiently large DNNs following standard training protocols exhibit well-behaved optimization dynamics that converge to performant solutions. This success has biased the community to use convex optimization as a mental model for learning, leading to a focus on training efficiency, either in terms of required iteration, FLOPs or wall-clock time, when improving optimizers. We argue that, while this perspective has proven extremely…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Explainable Artificial Intelligence (XAI)