Scaling up Stochastic Gradient Descent for Non-convex Optimisation

Saad Mohamad; Hamad Alamri; Abdelhamid Bouchachia

arXiv:2210.02882·stat.ML·October 7, 2022

Scaling up Stochastic Gradient Descent for Non-convex Optimisation

Saad Mohamad, Hamad Alamri, Abdelhamid Bouchachia

PDF

Open Access

TL;DR

This paper introduces DPSGD, a scalable distributed and parallel stochastic gradient descent method for non-convex optimization, demonstrating theoretical convergence guarantees and empirical improvements in machine learning and reinforcement learning tasks.

Contribution

The paper presents DPSGD, a unified asynchronous and lock-free framework that improves scalability and convergence in non-convex optimization for large datasets and distributed systems.

Findings

01

DPSGD achieves near-linear speed-up with cores and workers.

02

Theoretical convergence rate of O(1/√T) under resource bounds.

03

Empirical validation on LDA and A2C showing performance gains.

Abstract

Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions and large datasets. We address the bottleneck problem arising when using both shared and distributed memory. Typically, the former is bounded by limited computation resources and bandwidth whereas the latter suffers from communication overheads. We propose a unified distributed and parallel implementation of SGD (named DPSGD) that relies on both asynchronous distribution and lock-free parallelism. By combining two strategies into a unified framework, DPSGD is able to strike a better trade-off between local computation and communication. The convergence properties of DPSGD are studied for non-convex problems such as those arising in statistical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Privacy-Preserving Technologies in Data

MethodsStochastic Gradient Descent · Variational Inference