SketchySGD: Reliable Stochastic Optimization via Randomized Curvature Estimates
Zachary Frangella, Pratik Rathore, Shipu Zhao, Madeleine Udell

TL;DR
SketchySGD introduces a randomized curvature estimation approach that enhances stochastic gradient methods, achieving faster convergence and better performance on ill-conditioned problems with minimal hyperparameter tuning.
Contribution
It proposes SketchySGD, a novel stochastic optimization method using randomized low-rank Hessian approximations and an automated stepsize for improved convergence.
Findings
Converges linearly to a small neighborhood of the optimum.
Outperforms traditional SGD in ill-conditioned least-squares problems.
Achieves comparable or better results than tuned stochastic methods on various datasets.
Abstract
SketchySGD improves upon existing stochastic gradient methods in machine learning by using randomized low-rank approximations to the subsampled Hessian and by introducing an automated stepsize that works well across a wide range of convex machine learning problems. We show theoretically that SketchySGD with a fixed stepsize converges linearly to a small ball around the optimum. Further, in the ill-conditioned setting we show SketchySGD converges at a faster rate than SGD for least-squares problems. We validate this improvement empirically with ridge regression experiments on real data. Numerical experiments on both ridge and logistic regression problems with dense and sparse data, show that SketchySGD equipped with its default hyperparameters can achieve comparable or better results than popular stochastic gradient methods, even when they have been tuned to yield their best performance.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM
MethodsLogistic Regression · Stochastic Gradient Descent · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Adam
