FireCaffe: near-linear acceleration of deep neural network training on   compute clusters

Forrest N. Iandola; Khalid Ashraf; Matthew W. Moskewicz; Kurt Keutzer

arXiv:1511.00175·cs.CV·January 11, 2016

FireCaffe: near-linear acceleration of deep neural network training on compute clusters

Forrest N. Iandola, Khalid Ashraf, Matthew W. Moskewicz, Kurt Keutzer

PDF

TL;DR

FireCaffe demonstrates near-linear scaling of deep neural network training across GPU clusters by optimizing communication strategies and hardware choices, significantly reducing training time for high-accuracy models.

Contribution

The paper introduces FireCaffe, a scalable framework for distributed DNN training that reduces communication overhead and enables near-linear speedup on GPU clusters.

Findings

01

Achieved 47x speedup on GoogLeNet with 128 GPUs.

02

Reduced communication overhead using reduction trees.

03

Identified hyperparameters for large-batch training without accuracy loss.

Abstract

Long training times for high-accuracy deep neural networks (DNNs) impede research into new DNN architectures and slow the development of high-accuracy DNNs. In this paper we present FireCaffe, which successfully scales deep neural network training across a cluster of GPUs. We also present a number of best practices to aid in comparing advancements in methods for scaling and accelerating the training of deep neural networks. The speed and scalability of distributed algorithms is almost always limited by the overhead of communicating between servers; DNN training is not an exception to this rule. Therefore, the key consideration here is to reduce communication overhead wherever possible, while not degrading the accuracy of the DNN models that we train. Our approach has three key pillars. First, we select network hardware that achieves high bandwidth between GPU servers -- Infiniband or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings