# DeLTA: GPU Performance Model for Deep Learning Applications with   In-depth Memory System Traffic Analysis

**Authors:** Sangkug Lym, Donghyuk Lee, Mike O'Connor, Niladrish Chatterjee, Mattan, Erez

arXiv: 1904.01691 · 2020-04-28

## TL;DR

DeLTA is an analytical GPU performance model that accurately estimates memory traffic in CNN training, enabling optimized resource scaling for improved performance across various architectures.

## Contribution

It introduces the first detailed analytical model for GPU memory traffic during CNN training, considering complex reuse patterns and guiding resource scaling.

## Key findings

- Model accurately predicts memory traffic across CNNs and GPU architectures.
- Enables balanced scaling of GPU compute and memory resources.
- Improves understanding of GPU performance bottlenecks in CNN training.

## Abstract

Training convolutional neural networks (CNNs) requires intense compute throughput and high memory bandwidth. Especially, convolution layers account for the majority of the execution time of CNN training, and GPUs are commonly used to accelerate these layer workloads. GPU design optimization for efficient CNN training acceleration requires the accurate modeling of how their performance improves when computing and memory resources are increased. We present DeLTA, the first analytical model that accurately estimates the traffic at each GPU memory hierarchy level, while accounting for the complex reuse patterns of a parallel convolution algorithm. We demonstrate that our model is both accurate and robust for different CNNs and GPU architectures. We then show how this model can be used to carefully balance the scaling of different GPU resources for efficient CNN performance improvement.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.01691/full.md

## Figures

38 figures with captions in the complete paper: https://tomesphere.com/paper/1904.01691/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/1904.01691/full.md

---
Source: https://tomesphere.com/paper/1904.01691