BPPSA: Scaling Back-propagation by Parallel Scan Algorithm

Shang Wang; Yifan Bai; Gennady Pekhimenko

arXiv:1907.10134·cs.LG·March 10, 2020·1 cites

BPPSA: Scaling Back-propagation by Parallel Scan Algorithm

Shang Wang, Yifan Bai, Gennady Pekhimenko

PDF

Open Access 1 Repo

TL;DR

This paper introduces BPPSA, a novel method that reformulates back-propagation as a scan operation, enabling scalable parallel training of deep learning models with significant speedups.

Contribution

It presents a new reformulation of back-propagation as a scan operation and a modified parallel scan algorithm to improve scalability on parallel systems.

Findings

01

Up to 2.75x speedup in overall training time

02

108x speedup in backward pass

03

Effective for RNN and pruned network retraining

Abstract

In an era when the performance of a single compute device plateaus, software must be designed to scale on massively parallel systems for better runtime performance. However, in the context of training deep learning models, the popular back-propagation (BP) algorithm imposes a strong sequential dependency in the process of gradient computation. Under model parallelism, BP takes $Θ (n)$ steps to complete which hinders its scalability on parallel systems ( $n$ represents the number of compute devices into which a model is partitioned). In this work, in order to improve the scalability of BP, we reformulate BP into a scan operation which is a primitive that performs an in-order aggregation on a sequence of values and returns the partial result at each step. We can then scale such reformulation of BP on parallel systems by our modified version of the Blelloch scan algorithm which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

UofT-EcoSystem/BPPSA-open
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Ferroelectric and Negative Capacitance Devices