Benanza: Automatic $\mu$Benchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs
Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-mei Hwu

TL;DR
Benanza is a tool that automatically generates micro-benchmarks to estimate the ideal latency of deep learning models on GPUs, helping to identify optimization opportunities and improve response times.
Contribution
This paper introduces Benanza, a novel framework for automatic benchmark generation and analysis to accurately estimate lower-bound latency and guide optimizations of deep learning models on GPUs.
Findings
Benanza successfully evaluated 30 models across 7 GPUs.
Identified optimization opportunities such as layer fusion and Tensor Cores usage.
Provided insights into framework inefficiencies and convolution algorithm choices.
Abstract
As Deep Learning (DL) models have been increasingly used in latency-sensitive applications, there has been a growing interest in improving their response time. An important venue for such improvement is to profile the execution of these models and characterize their performance to identify possible optimization opportunities. However, the current profiling tools lack the highly desired abilities to characterize ideal performance, identify sources of inefficiency, and quantify the benefits of potential optimizations. Such deficiencies have led to slow characterization/optimization cycles that cannot keep up with the fast pace at which new DL models are introduced. We propose Benanza, a sustainable and extensible benchmarking and analysis design that speeds up the characterization/optimization cycle of DL models on GPUs. Benanza consists of four major components: a model processor that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution
