Accelerated Parallel Optimization Methods for Large Scale Machine Learning
Haipeng Luo, Patrick Haffner, Jean-Francois Paiement

TL;DR
This paper develops accelerated parallel optimization algorithms combining Nesterov's acceleration and parallelism to improve efficiency and scalability for large-scale machine learning problems, especially with high-dimensional data.
Contribution
It introduces an accelerated parallel version of Shotgun, improving convergence rates, and refines the analysis of BOOM, providing a unified framework for related methods.
Findings
Accelerated Shotgun achieves faster convergence rate of O(1/t^2).
Refined sparsity measurement improves BOOM's performance.
Unified framework simplifies analysis of parallel optimization methods.
Abstract
The growing amount of high dimensional data in different machine learning applications requires more efficient and scalable optimization algorithms. In this work, we consider combining two techniques, parallelism and Nesterov's acceleration, to design faster algorithms for L1-regularized loss. We first simplify BOOM, a variant of gradient descent, and study it in a unified framework, which allows us to not only propose a refined measurement of sparsity to improve BOOM, but also show that BOOM is provably slower than FISTA. Moving on to parallel coordinate descent methods, we then propose an efficient accelerated version of Shotgun, improving the convergence rate from to . Our algorithm enjoys a concise form and analysis compared to previous work, and also allows one to study several connected work in a unified way.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Sparse and Compressive Sensing Techniques
