Scaling up Stochastic Gradient Descent for Non-convex Optimisation
Saad Mohamad, Hamad Alamri, Abdelhamid Bouchachia

TL;DR
This paper introduces DPSGD, a scalable distributed and parallel stochastic gradient descent method for non-convex optimization, demonstrating theoretical convergence guarantees and empirical improvements in machine learning and reinforcement learning tasks.
Contribution
The paper presents DPSGD, a unified asynchronous and lock-free framework that improves scalability and convergence in non-convex optimization for large datasets and distributed systems.
Findings
DPSGD achieves near-linear speed-up with cores and workers.
Theoretical convergence rate of O(1/√T) under resource bounds.
Empirical validation on LDA and A2C showing performance gains.
Abstract
Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions and large datasets. We address the bottleneck problem arising when using both shared and distributed memory. Typically, the former is bounded by limited computation resources and bandwidth whereas the latter suffers from communication overheads. We propose a unified distributed and parallel implementation of SGD (named DPSGD) that relies on both asynchronous distribution and lock-free parallelism. By combining two strategies into a unified framework, DPSGD is able to strike a better trade-off between local computation and communication. The convergence properties of DPSGD are studied for non-convex problems such as those arising in statistical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Privacy-Preserving Technologies in Data
MethodsStochastic Gradient Descent · Variational Inference
