A Derivation of Nesterov's Accelerated Gradient Algorithm from Optimal Control Theory
I. M. Ross

TL;DR
This paper derives Nesterov's accelerated gradient algorithm from optimal control theory, providing a first-principles explanation and connecting it to continuous-time dynamical systems.
Contribution
It introduces a novel derivation of Nesterov's algorithm using optimal control principles, linking discrete algorithms to continuous dynamical systems.
Findings
Derives Nesterov's algorithm from optimal control theory
Connects accelerated optimization to controllable dynamical systems
Provides a new perspective on the algorithm's underlying principles
Abstract
Nesterov's accelerated gradient algorithm is derived from first principles. The first principles are founded on the recently-developed optimal control theory for optimization. This theory frames an optimization problem as an optimal control problem whose trajectories generate various continuous-time algorithms. The algorithmic trajectories satisfy the necessary conditions for optimal control. The necessary conditions produce a controllable dynamical system for accelerated optimization. Stabilizing this system via a quadratic control Lyapunov function generates an ordinary differential equation. An Euler discretization of the resulting differential equation produces Nesterov's algorithm. In this context, this result solves the purported mystery surrounding the algorithm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimization and Variational Analysis · Advanced Control Systems Optimization · Advanced Optimization Algorithms Research
