# Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained   Parallelism

**Authors:** Nikoli Dryden, Naoya Maruyama, Tom Benson, Tim Moon, Marc Snir, Brian, Van Essen

arXiv: 1903.06681 · 2019-03-18

## TL;DR

This paper presents new parallelization strategies for CNN training that improve strong and weak scaling, enabling efficient training on large datasets and samples by exploiting finer-grained parallelism.

## Contribution

It introduces convolution approaches based on spatial and combined sample-spatial decomposition, along with a performance model to optimize parallelization strategies.

## Key findings

- Achieved excellent strong and weak scaling in CNN training.
- Enabled training on datasets with very large samples previously infeasible.
- Demonstrated effectiveness with ResNet-50 and a mesh-tangling dataset.

## Abstract

Scaling CNN training is necessary to keep up with growing datasets and reduce training time. We also see an emerging need to handle datasets with very large samples, where memory requirements for training are large. Existing training frameworks use a data-parallel approach that partitions samples within a mini-batch, but limits to scaling the mini-batch size and memory consumption makes this untenable for large samples. We describe and implement new approaches to convolution, which parallelize using spatial decomposition or a combination of sample and spatial decomposition. This introduces many performance knobs for a network, so we develop a performance model for CNNs and present a method for using it to automatically determine efficient parallelization strategies.   We evaluate our algorithms with microbenchmarks and image classification with ResNet-50. Our algorithms allow us to prototype a model for a mesh-tangling dataset, where sample sizes are very large. We show that our parallelization achieves excellent strong and weak scaling and enables training for previously unreachable datasets.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.06681/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/1903.06681/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/1903.06681/full.md

---
Source: https://tomesphere.com/paper/1903.06681