Benchmarking TPU, GPU, and CPU Platforms for Deep Learning
Yu Emma Wang, Gu-Yeon Wei, David Brooks

TL;DR
This paper introduces ParaDnn, a comprehensive benchmark suite for evaluating deep learning hardware platforms, and compares TPU, GPU, and CPU performance across various models, revealing unique strengths and bottlenecks.
Contribution
It presents ParaDnn for systematic benchmarking and provides an in-depth comparison of TPU, GPU, and CPU platforms for deep learning workloads.
Findings
TPU, GPU, and CPU each excel at different model types
TPU architecture has specific bottlenecks identified
Specialized software significantly boosts hardware performance
Abstract
Training deep learning models is compute-intensive and there is an industry-wide trend towards hardware specialization to improve performance. To systematically benchmark deep learning platforms, we introduce ParaDnn, a parameterized benchmark suite for deep learning that generates end-to-end models for fully connected (FC), convolutional (CNN), and recurrent (RNN) neural networks. Along with six real-world models, we benchmark Google's Cloud TPU v2/v3, NVIDIA's V100 GPU, and an Intel Skylake CPU platform. We take a deep dive into TPU architecture, reveal its bottlenecks, and highlight valuable lessons learned for future specialized system design. We also provide a thorough comparison of the platforms and find that each has unique strengths for some types of models. Finally, we quantify the rapid performance improvements that specialized software stacks provide for the TPU and GPU…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Ferroelectric and Negative Capacitance Devices
