Pipelined Training with Stale Weights of Deep Convolutional Neural   Networks

Lifu Zhang; Tarek S. Abdelrahman

arXiv:1912.12675·cs.DC·January 1, 2020

Pipelined Training with Stale Weights of Deep Convolutional Neural Networks

Lifu Zhang, Tarek S. Abdelrahman

PDF

Open Access

TL;DR

This paper investigates the effects of using stale weights in pipelined training of CNNs, showing that limited pipelining maintains accuracy and improves training speed, while deeper pipelining causes accuracy drops.

Contribution

It introduces a hybrid pipelined training scheme that balances accelerator utilization and accuracy, demonstrating practical implementation and performance gains.

Findings

01

Limited pipelining preserves accuracy within 1.45% of non-pipelined training.

02

Hybrid scheme mitigates accuracy loss when pipelining deeper layers.

03

Achieves up to 1.8X speedup on 2 GPUs with minimal accuracy drop.

Abstract

The growth in the complexity of Convolutional Neural Networks (CNNs) is increasing interest in partitioning a network across multiple accelerators during training and pipelining the backpropagation computations over the accelerators. Existing approaches avoid or limit the use of stale weights through techniques such as micro-batching or weight stashing. These techniques either underutilize of accelerators or increase memory footprint. We explore the impact of stale weights on the statistical efficiency and performance in a pipelined backpropagation scheme that maximizes accelerator utilization and keeps memory overhead modest. We use 4 CNNs (LeNet-5, AlexNet, VGG and ResNet) and show that when pipelining is limited to early layers in a network, training with stale weights converges and results in models with comparable inference accuracies to those resulting from non-pipelined training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning

MethodsAverage Pooling · Global Average Pooling · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Kaiming Initialization · Residual Connection · Residual Block · Local Response Normalization · Bitcoin Customer Service Number +1-833-534-1729