An argument in favor of strong scaling for deep neural networks with   small datasets

Renato L. de F. Cunha; Eduardo R. Rodrigues; Matheus Palhares Viana,; Dario Augusto Borges Oliveira

arXiv:1807.09161·cs.DC·July 15, 2020

An argument in favor of strong scaling for deep neural networks with small datasets

Renato L. de F. Cunha, Eduardo R. Rodrigues, Matheus Palhares Viana,, Dario Augusto Borges Oliveira

PDF

TL;DR

This paper advocates for strong scaling in training deep neural networks on small datasets, demonstrating it achieves reliable performance and maintains accuracy, unlike weak scaling which often fails to converge.

Contribution

The paper challenges the common use of weak scaling for small datasets and provides empirical evidence supporting strong scaling as a more effective approach.

Findings

01

Weak scaling often fails to converge on small datasets.

02

Strong scaling maintains accuracy comparable to sequential training.

03

Strong scaling demonstrates good scalability up to 32 GPUs.

Abstract

In recent years, with the popularization of deep learning frameworks and large datasets, researchers have started parallelizing their models in order to train faster. This is crucially important, because they typically explore many hyperparameters in order to find the best ones for their applications. This process is time consuming and, consequently, speeding up training improves productivity. One approach to parallelize deep learning models followed by many researchers is based on weak scaling. The minibatches increase in size as new GPUs are added to the system. In addition, new learning rates schedules have been proposed to fix optimization issues that occur with large minibatch sizes. In this paper, however, we show that the recommendations provided by recent work do not apply to models that lack large datasets. In fact, we argument in favor of using strong scaling for achieving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.