Accelerating Proximal Gradient-type Algorithms using Damped Anderson Acceleration with Restarts and Nesterov Initialization
Nicholas C. Henderson, Ravi Varadhan

TL;DR
This paper introduces a two-phase acceleration scheme for proximal gradient algorithms that combines Nesterov's momentum and Anderson acceleration with restarts, significantly improving convergence speed in large-scale optimization.
Contribution
The paper proposes a novel two-phase scheme integrating Nesterov acceleration and Anderson acceleration with restarts for proximal gradient methods, enhancing convergence efficiency.
Findings
Substantial convergence improvements over existing methods.
Effective phase switching based on restart schemes.
Enhanced performance in sparse optimization problems.
Abstract
Despite their frequent slow convergence, proximal gradient schemes are widely used in large-scale optimization tasks due to their tremendous stability, scalability, and ease of computation. In this paper, we develop and investigate a general two-phase scheme for accelerating the convergence of proximal gradient algorithms. By using Nesterov's momentum method in an initialization phase, our procedure delivers fast initial descent that is robust to the choice of starting value. Once iterates are much closer to the solution after the first phase, we utilize a variation of Anderson acceleration to deliver more rapid local convergence in the second phase. Drawing upon restarting schemes developed for Nesterov acceleration, we can readily identify points where it is advantageous to switch from the first to the second phase, which enables use of the procedure without requiring one to specify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Machine Learning and ELM
