Benchmarking the Performance and Energy Efficiency of AI Accelerators   for AI Training

Yuxin Wang; Qiang Wang; Shaohuai Shi; Xin He; Zhenheng Tang; Kaiyong; Zhao; Xiaowen Chu

arXiv:1909.06842·cs.DC·October 12, 2020·6 cites

Benchmarking the Performance and Energy Efficiency of AI Accelerators for AI Training

Yuxin Wang, Qiang Wang, Shaohuai Shi, Xin He, Zhenheng Tang, Kaiyong, Zhao, Xiaowen Chu

PDF

Open Access

TL;DR

This paper conducts a comprehensive empirical benchmarking of popular AI accelerators, analyzing their performance and energy efficiency across various deep learning workloads to guide users and inform hardware improvements.

Contribution

It provides a detailed comparison of performance and energy consumption of different off-the-shelf AI processors, considering hardware, software libraries, and frameworks.

Findings

01

NVIDIA GPUs outperform others in training speed.

02

Energy efficiency varies significantly among processors.

03

Software libraries impact performance and energy consumption.

Abstract

Deep learning has become widely used in complex AI applications. Yet, training a deep neural network (DNNs) model requires a considerable amount of calculations, long running time, and much energy. Nowadays, many-core AI accelerators (e.g., GPUs and TPUs) are designed to improve the performance of AI training. However, processors from different vendors perform dissimilarly in terms of performance and energy consumption. To investigate the differences among several popular off-the-shelf processors (i.e., Intel CPU, NVIDIA GPU, AMD GPU, and Google TPU) in training DNNs, we carry out a comprehensive empirical study on the performance and energy efficiency of these processors by benchmarking a representative set of deep learning workloads, including computation-intensive operations, classical convolutional neural networks (CNNs), recurrent neural networks (LSTM), Deep Speech 2, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Adam · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Multi-Head Attention · Byte Pair Encoding · Dense Connections