SparkNet: Training Deep Networks in Spark
Philipp Moritz, Robert Nishihara, Ion Stoica, Michael I. Jordan

TL;DR
SparkNet is a framework that enables efficient training of deep neural networks in Spark clusters by addressing communication challenges, providing scalability, and maintaining compatibility with existing models, thus significantly reducing training time.
Contribution
It introduces a Spark-based deep learning framework with a simple parallelization scheme, high scalability, and ease of deployment, compatible with Caffe models and designed for high-latency environments.
Findings
SparkNet scales well with cluster size.
High-latency communication is tolerated effectively.
Benchmarking on ImageNet shows competitive performance.
Abstract
Training deep networks is a time-consuming process, with networks for object recognition often requiring multiple days to train. For this reason, leveraging the resources of a cluster to speed up training is an important area of work. However, widely-popular batch-processing computational frameworks like MapReduce and Spark were not designed to support the asynchronous and communication-intensive workloads of existing distributed deep learning systems. We introduce SparkNet, a framework for training deep networks in Spark. Our implementation includes a convenient interface for reading data from Spark RDDs, a Scala interface to the Caffe deep learning framework, and a lightweight multi-dimensional tensor library. Using a simple parallelization scheme for stochastic gradient descent, SparkNet scales well with the cluster size and tolerates very high-latency communication. Furthermore, it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
