Accelerated SGD for Non-Strongly-Convex Least Squares
Aditya Varre, Nicolas Flammarion

TL;DR
This paper introduces a practical accelerated stochastic gradient descent algorithm for non-strongly convex least squares regression, achieving optimal prediction error rates and fast initial condition forgetting.
Contribution
It presents the first practical accelerated SGD algorithm with optimal error dependence and proven convergence in the non-strongly convex least squares setting.
Findings
Achieves $O(d/t)$ prediction error rate.
Accelerates initial condition forgetting to $O(d/t^2)$.
Proves optimality with matching lower bounds.
Abstract
We consider stochastic approximation for the least squares regression problem in the non-strongly convex setting. We present the first practical algorithm that achieves the optimal prediction error rates in terms of dependence on the noise of the problem, as while accelerating the forgetting of the initial conditions to . Our new algorithm is based on a simple modification of the accelerated gradient descent. We provide convergence results for both the averaged and the last iterate of the algorithm. In order to describe the tightness of these new bounds, we present a matching lower bound in the noiseless setting and thus show the optimality of our algorithm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning
