Theoretical Limits of Pipeline Parallel Optimization and Application to   Distributed Deep Learning

Igor Colin; Ludovic Dos Santos; Kevin Scaman

arXiv:1910.05104·stat.ML·October 14, 2019·5 cites

Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning

Igor Colin, Ludovic Dos Santos, Kevin Scaman

PDF

Open Access

TL;DR

This paper explores the theoretical limits of pipeline parallel optimization in deep learning, providing bounds, a new algorithm for non-smooth cases, and empirical evidence of its advantages in complex, limited-data scenarios.

Contribution

It introduces a comprehensive theoretical analysis of pipeline parallel optimization, proposes a novel algorithm PPRS for non-smooth functions, and demonstrates its practical benefits over traditional methods.

Findings

01

Optimality of naive pipeline parallel Nesterov's method.

02

PPRS achieves near-optimal convergence rate with depth-dependent acceleration.

03

Empirical results show PPRS outperforms traditional algorithms in challenging non-smooth, limited-data problems.

Abstract

We investigate the theoretical limits of pipeline parallel learning of deep learning architectures, a distributed setup in which the computation is distributed per layer instead of per example. For smooth convex and non-convex objective functions, we provide matching lower and upper complexity bounds and show that a naive pipeline parallelization of Nesterov's accelerated gradient descent is optimal. For non-smooth convex functions, we provide a novel algorithm coined Pipeline Parallel Random Smoothing (PPRS) that is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension. While the convergence rate still obeys a slow $ε^{- 2}$ convergence rate, the depth-dependent part is accelerated, resulting in a near-linear speed-up and convergence time that only slightly depends on the depth of the deep learning architecture.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning