# Gradient flows and proximal splitting methods: A unified view on   accelerated and stochastic optimization

**Authors:** Guilherme Fran\c{c}a, Daniel P. Robinson, Ren\'e Vidal

arXiv: 1908.00865 · 2021-05-11

## TL;DR

This paper unifies various proximal algorithms and acceleration techniques in optimization by showing they are discretizations of a single gradient flow differential equation, extending to stochastic settings and connecting to physical dissipative systems.

## Contribution

It demonstrates that multiple proximal algorithms are discretizations of a single differential equation and introduces accelerated variants and stochastic extensions within a unified framework.

## Key findings

- All proximal algorithms can be derived from a single gradient flow equation.
- Accelerated variants of these algorithms are developed, many of which are novel.
- The framework extends to stochastic optimization, linking to Langevin and Fokker-Planck equations.

## Abstract

Optimization is at the heart of machine learning, statistics and many applied scientific disciplines. It also has a long history in physics, ranging from the minimal action principle to finding ground states of disordered systems such as spin glasses. Proximal algorithms form a class of methods that are broadly applicable and are particularly well-suited to nonsmooth, constrained, large-scale, and distributed optimization problems. There are essentially five proximal algorithms currently known: Forward-backward splitting, Tseng splitting, Douglas-Rachford, alternating direction method of multipliers, and the more recent Davis-Yin. These methods sit on a higher level of abstraction compared to gradient-based ones, with deep roots in nonlinear functional analysis. We show that all of these methods are actually different discretizations of a single differential equation, namely, the simple gradient flow which dates back to Cauchy (1847). An important aspect behind many of the success stories in machine learning relies on "accelerating" the convergence of first-order methods. We show that similar discretization schemes applied to Newton's equation with an additional dissipative force, which we refer to as accelerated gradient flow, allow us to obtain accelerated variants of all these proximal algorithms -- the majority of which are new although some recover known cases in the literature. Furthermore, we extend these methods to stochastic settings, allowing us to make connections with Langevin and Fokker-Planck equations. Similar ideas apply to gradient descent, heavy ball, and Nesterov's method which are simpler. Our results therefore provide a unified framework from which several important optimization methods are nothing but simulations of classical dissipative systems.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.00865/full.md

## Figures

30 figures with captions in the complete paper: https://tomesphere.com/paper/1908.00865/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/1908.00865/full.md

---
Source: https://tomesphere.com/paper/1908.00865