Benanza: Automatic $\mu$Benchmark Generation to Compute "Lower-bound"   Latency and Inform Optimizations of Deep Learning Models on GPUs

Cheng Li; Abdul Dakkak; Jinjun Xiong; Wen-mei Hwu

arXiv:1911.06922·cs.LG·June 4, 2020

Benanza: Automatic $\mu$Benchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs

Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-mei Hwu

PDF

TL;DR

Benanza is a tool that automatically generates micro-benchmarks to estimate the ideal latency of deep learning models on GPUs, helping to identify optimization opportunities and improve response times.

Contribution

This paper introduces Benanza, a novel framework for automatic benchmark generation and analysis to accurately estimate lower-bound latency and guide optimizations of deep learning models on GPUs.

Findings

01

Benanza successfully evaluated 30 models across 7 GPUs.

02

Identified optimization opportunities such as layer fusion and Tensor Cores usage.

03

Provided insights into framework inefficiencies and convolution algorithm choices.

Abstract

As Deep Learning (DL) models have been increasingly used in latency-sensitive applications, there has been a growing interest in improving their response time. An important venue for such improvement is to profile the execution of these models and characterize their performance to identify possible optimization opportunities. However, the current profiling tools lack the highly desired abilities to characterize ideal performance, identify sources of inefficiency, and quantify the benefits of potential optimizations. Such deficiencies have led to slow characterization/optimization cycles that cannot keep up with the fast pace at which new DL models are introduced. We propose Benanza, a sustainable and extensible benchmarking and analysis design that speeds up the characterization/optimization cycle of DL models on GPUs. Benanza consists of four major components: a model processor that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsConvolution