Regularized Risk Minimization by Nesterov's Accelerated Gradient Methods: Algorithmic Extensions and Empirical Studies
Xinhua Zhang, Ankan Saha, S.V.N. Vishwanathan

TL;DR
This paper extends Nesterov's accelerated gradient methods to handle strongly convex and composite functions, providing a unified framework that improves convergence and empirical performance on max-margin models.
Contribution
The authors develop a unifying AGM framework with adaptive Lipschitz tuning and duality gap bounds, enhancing its applicability and efficiency for machine learning tasks.
Findings
AGM outperforms state-of-the-art solvers on max-margin models
Framework covers both $ ext{infinity}$-memory and 1-memory AGM styles
Enhanced convergence rates and efficient gradient computations
Abstract
Nesterov's accelerated gradient methods (AGM) have been successfully applied in many machine learning areas. However, their empirical performance on training max-margin models has been inferior to existing specialized solvers. In this paper, we first extend AGM to strongly convex and composite objective functions with Bregman style prox-functions. Our unifying framework covers both the -memory and 1-memory styles of AGM, tunes the Lipschiz constant adaptively, and bounds the duality gap. Then we demonstrate various ways to apply this framework of methods to a wide range of machine learning problems. Emphasis will be given on their rate of convergence and how to efficiently compute the gradient and optimize the models. The experimental results show that with our extensions AGM outperforms state-of-the-art solvers on max-margin models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Machine Learning and ELM
