BPPSA: Scaling Back-propagation by Parallel Scan Algorithm
Shang Wang, Yifan Bai, Gennady Pekhimenko

TL;DR
This paper introduces BPPSA, a novel method that reformulates back-propagation as a scan operation, enabling scalable parallel training of deep learning models with significant speedups.
Contribution
It presents a new reformulation of back-propagation as a scan operation and a modified parallel scan algorithm to improve scalability on parallel systems.
Findings
Up to 2.75x speedup in overall training time
108x speedup in backward pass
Effective for RNN and pruned network retraining
Abstract
In an era when the performance of a single compute device plateaus, software must be designed to scale on massively parallel systems for better runtime performance. However, in the context of training deep learning models, the popular back-propagation (BP) algorithm imposes a strong sequential dependency in the process of gradient computation. Under model parallelism, BP takes steps to complete which hinders its scalability on parallel systems ( represents the number of compute devices into which a model is partitioned). In this work, in order to improve the scalability of BP, we reformulate BP into a scan operation which is a primitive that performs an in-order aggregation on a sequence of values and returns the partial result at each step. We can then scale such reformulation of BP on parallel systems by our modified version of the Blelloch scan algorithm which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Ferroelectric and Negative Capacitance Devices
