FireCaffe: near-linear acceleration of deep neural network training on compute clusters
Forrest N. Iandola, Khalid Ashraf, Matthew W. Moskewicz, Kurt Keutzer

TL;DR
FireCaffe demonstrates near-linear scaling of deep neural network training across GPU clusters by optimizing communication strategies and hardware choices, significantly reducing training time for high-accuracy models.
Contribution
The paper introduces FireCaffe, a scalable framework for distributed DNN training that reduces communication overhead and enables near-linear speedup on GPU clusters.
Findings
Achieved 47x speedup on GoogLeNet with 128 GPUs.
Reduced communication overhead using reduction trees.
Identified hyperparameters for large-batch training without accuracy loss.
Abstract
Long training times for high-accuracy deep neural networks (DNNs) impede research into new DNN architectures and slow the development of high-accuracy DNNs. In this paper we present FireCaffe, which successfully scales deep neural network training across a cluster of GPUs. We also present a number of best practices to aid in comparing advancements in methods for scaling and accelerating the training of deep neural networks. The speed and scalability of distributed algorithms is almost always limited by the overhead of communicating between servers; DNN training is not an exception to this rule. Therefore, the key consideration here is to reduce communication overhead wherever possible, while not degrading the accuracy of the DNN models that we train. Our approach has three key pillars. First, we select network hardware that achieves high bandwidth between GPU servers -- Infiniband or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
