SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for   Large-Scale Deep Learning Systems

Beidi Chen; Tharun Medini; James Farwell; Sameh Gobriel; Charlie Tai,; Anshumali Shrivastava

arXiv:1903.03129·cs.DC·March 3, 2020·44 cites

SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems

Beidi Chen, Tharun Medini, James Farwell, Sameh Gobriel, Charlie Tai,, Anshumali Shrivastava

PDF

Open Access 3 Repos

TL;DR

SLIDE introduces a CPU-based algorithm that combines randomized methods and workload optimization, outperforming GPU-based training in large-scale deep learning tasks without specialized hardware.

Contribution

The paper presents SLIDE, a novel algorithm that leverages smart randomized techniques and multi-core parallelism to significantly accelerate deep learning training on CPUs, reducing reliance on expensive hardware.

Findings

01

SLIDE trains large neural networks 3.5 times faster than TensorFlow on GPUs.

02

SLIDE achieves over 10x speedup over TensorFlow on the same CPU hardware.

03

Training with SLIDE on CPUs outperforms GPU-based training at all accuracy levels.

Abstract

Deep Learning (DL) algorithms are the central focus of modern machine learning systems. As data volumes keep growing, it has become customary to train large neural networks with hundreds of millions of parameters to maintain enough capacity to memorize these volumes and obtain state-of-the-art accuracy. To get around the costly computations associated with large models and data, the community is increasingly investing in specialized hardware for model training. However, specialized hardware is expensive and hard to generalize to a multitude of tasks. The progress on the algorithmic front has failed to demonstrate a direct advantage over powerful hardware such as NVIDIA-V100 GPUs. This paper provides an exception. We propose SLIDE (Sub-LInear Deep learning Engine) that uniquely blends smart randomized algorithms, with multi-core parallelism and workload optimization. Using just a CPU,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Machine Learning and Data Classification