SparkNet: Training Deep Networks in Spark

Philipp Moritz; Robert Nishihara; Ion Stoica; Michael I. Jordan

arXiv:1511.06051·stat.ML·March 1, 2016·ICLR·75 cites

SparkNet: Training Deep Networks in Spark

Philipp Moritz, Robert Nishihara, Ion Stoica, Michael I. Jordan

PDF

Open Access 1 Repo

TL;DR

SparkNet is a framework that enables efficient training of deep neural networks in Spark clusters by addressing communication challenges, providing scalability, and maintaining compatibility with existing models, thus significantly reducing training time.

Contribution

It introduces a Spark-based deep learning framework with a simple parallelization scheme, high scalability, and ease of deployment, compatible with Caffe models and designed for high-latency environments.

Findings

01

SparkNet scales well with cluster size.

02

High-latency communication is tolerated effectively.

03

Benchmarking on ImageNet shows competitive performance.

Abstract

Training deep networks is a time-consuming process, with networks for object recognition often requiring multiple days to train. For this reason, leveraging the resources of a cluster to speed up training is an important area of work. However, widely-popular batch-processing computational frameworks like MapReduce and Spark were not designed to support the asynchronous and communication-intensive workloads of existing distributed deep learning systems. We introduce SparkNet, a framework for training deep networks in Spark. Our implementation includes a convenient interface for reading data from Spark RDDs, a Scala interface to the Caffe deep learning framework, and a lightweight multi-dimensional tensor library. Using a simple parallelization scheme for stochastic gradient descent, SparkNet scales well with the cluster size and tolerates very high-latency communication. Furthermore, it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amplab/SparkNet
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings