GPU Accelerated Sub-Sampled Newton's Method
Sudhir B. Kylasa, Farbod Roosta-Khorasani, Michael W. Mahoney and, Ananth Grama

TL;DR
This paper demonstrates that GPU-accelerated, sub-sampled second-order Newton methods can outperform first-order methods in large-scale machine learning tasks by leveraging curvature information efficiently.
Contribution
It introduces GPU-accelerated, sub-sampled Newton methods that are more efficient than first-order methods in large-scale convex problems, challenging conventional beliefs.
Findings
GPU acceleration significantly speeds up Newton-type methods.
Sub-sampled Newton methods achieve higher accuracy faster.
Methods outperform existing techniques in popular ML software.
Abstract
First order methods, which solely rely on gradient information, are commonly used in diverse machine learning (ML) and data analysis (DA) applications. This is attributed to the simplicity of their implementations, as well as low per-iteration computational/storage costs. However, they suffer from significant disadvantages; most notably, their performance degrades with increasing problem ill-conditioning. Furthermore, they often involve a large number of hyper-parameters, and are notoriously sensitive to parameters such as the step-size. By incorporating additional information from the Hessian, second-order methods, have been shown to be resilient to many such adversarial effects. However, these advantages of using curvature information come at the cost of higher per-iteration costs, which in \enquote{big data} regimes, can be computationally prohibitive. In this paper, we show that,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and Algorithms
