Fast Asynchronous Parallel Stochastic Gradient Decent

Shen-Yi Zhao; Wu-Jun Li

arXiv:1508.05711·stat.ML·August 25, 2015·5 cites

Fast Asynchronous Parallel Stochastic Gradient Decent

Shen-Yi Zhao, Wu-Jun Li

PDF

Open Access

TL;DR

This paper introduces AsySVRG, a fast asynchronous parallel stochastic gradient descent method that improves convergence and efficiency over existing methods like Hogwild! for large-scale machine learning tasks.

Contribution

The paper proposes AsySVRG, an asynchronous parallel SGD algorithm that combines SVRG with an innovative asynchronous strategy, enhancing performance in large-scale settings.

Findings

01

AsySVRG outperforms Hogwild! in convergence rate.

02

AsySVRG reduces computation cost compared to existing methods.

03

Theoretical analysis confirms faster convergence of AsySVRG.

Abstract

Stochastic gradient descent~(SGD) and its variants have become more and more popular in machine learning due to their efficiency and effectiveness. To handle large-scale problems, researchers have recently proposed several parallel SGD methods for multicore systems. However, existing parallel SGD methods cannot achieve satisfactory performance in real applications. In this paper, we propose a fast asynchronous parallel SGD method, called AsySVRG, by designing an asynchronous strategy to parallelize the recently proposed SGD variant called stochastic variance reduced gradient~(SVRG). Both theoretical and empirical results show that AsySVRG can outperform existing state-of-the-art parallel SGD methods like Hogwild! in terms of convergence rate and computation cost.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Face and Expression Recognition

MethodsStochastic Gradient Descent