A convergence analysis of the perturbed compositional gradient flow:   averaging principle and normal deviations

Wenqing Hu; Chris Junchi Li

arXiv:1709.00515·math.PR·July 26, 2018

A convergence analysis of the perturbed compositional gradient flow: averaging principle and normal deviations

Wenqing Hu, Chris Junchi Li

PDF

TL;DR

This paper analyzes the perturbed compositional gradient flow, showing that its slow component converges to an averaged ODE and its deviations follow a Gaussian process, validating the effectiveness of the SCGD algorithm.

Contribution

It provides a convergence analysis of the perturbed compositional gradient flow, linking it to stochastic gradient algorithms and establishing its asymptotic efficiency.

Findings

01

The slow motion converges to an averaged ODE.

02

Deviations from the average follow a Gaussian process.

03

SCGD algorithm has the same asymptotic convergence time as classical SGD in the strongly convex case.

Abstract

We consider in this work a system of two stochastic differential equations named the perturbed compositional gradient flow. By introducing a separation of fast and slow scales of the two equations, we show that the limit of the slow motion is given by an averaged ordinary differential equation. We then demonstrate that the deviation of the slow motion from the averaged equation, after proper rescaling, converges to a stochastic process with Gaussian inputs. This indicates that the slow motion can be approximated in the weak sense by a standard perturbed gradient flow or the continuous-time stochastic gradient descent algorithm that solves the optimization problem for a composition of two functions. As an application, the perturbed compositional gradient flow corresponds to the diffusion limit of the Stochastic Composite Gradient Descent (SCGD) algorithm for minimizing a composition of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.