First order online optimisation using forward gradients in over-parameterised systems
Behnam Mafakheri, Iman Shames, Jonathan H. Manton

TL;DR
This paper analyzes the effectiveness of first-order online optimization using forward gradients in over-parameterized, non-convex systems, providing convergence bounds and resource trade-offs.
Contribution
It introduces a novel analysis of forward gradient methods for time-varying non-convex problems, deriving convergence bounds and resource-aware iteration strategies.
Findings
Linear convergence to solutions or neighborhoods of solutions.
Convergence rate decreases with problem dimension.
Forward gradient iterations can be tuned for resource constraints.
Abstract
The success of deep learning over the past decade mainly relies on gradient-based optimisation and backpropagation. This paper focuses on analysing the performance of first-order gradient-based optimisation algorithms, gradient descent and proximal gradient, with time-varying non-convex cost function under (proximal) Polyak-{\L}ojasiewicz condition. Specifically, we focus on using the forward mode of automatic differentiation to compute gradients in the fast-changing problems where calculating gradients using the backpropagation algorithm is either impossible or inefficient. Upper bounds for tracking and asymptotic errors are derived for various cases, showing the linear convergence to a solution or a neighbourhood of an optimal solution, where the convergence rate decreases with the increase in the dimension of the problem. We show that for a solver with constraints on computing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed Control Multi-Agent Systems · Sparse and Compressive Sensing Techniques · Optimization and Variational Analysis
