Benchmarking TPU, GPU, and CPU Platforms for Deep Learning

Yu Emma Wang; Gu-Yeon Wei; David Brooks

arXiv:1907.10701·cs.LG·October 23, 2019·232 cites

Benchmarking TPU, GPU, and CPU Platforms for Deep Learning

Yu Emma Wang, Gu-Yeon Wei, David Brooks

PDF

Open Access 1 Repo

TL;DR

This paper introduces ParaDnn, a comprehensive benchmark suite for evaluating deep learning hardware platforms, and compares TPU, GPU, and CPU performance across various models, revealing unique strengths and bottlenecks.

Contribution

It presents ParaDnn for systematic benchmarking and provides an in-depth comparison of TPU, GPU, and CPU platforms for deep learning workloads.

Findings

01

TPU, GPU, and CPU each excel at different model types

02

TPU architecture has specific bottlenecks identified

03

Specialized software significantly boosts hardware performance

Abstract

Training deep learning models is compute-intensive and there is an industry-wide trend towards hardware specialization to improve performance. To systematically benchmark deep learning platforms, we introduce ParaDnn, a parameterized benchmark suite for deep learning that generates end-to-end models for fully connected (FC), convolutional (CNN), and recurrent (RNN) neural networks. Along with six real-world models, we benchmark Google's Cloud TPU v2/v3, NVIDIA's V100 GPU, and an Intel Skylake CPU platform. We take a deep dive into TPU architecture, reveal its bottlenecks, and highlight valuable lessons learned for future specialized system design. We also provide a thorough comparison of the platforms and find that each has unique strengths for some types of models. Finally, we quantify the rapid performance improvements that specialized software stacks provide for the TPU and GPU…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Emma926/paradnn
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Ferroelectric and Negative Capacitance Devices