# Exploiting Parallelism Opportunities with Deep Learning Frameworks

**Authors:** Yu Emma Wang, Carole-Jean Wu, Xiaodong Wang, Kim Hazelwood, David, Brooks

arXiv: 1908.04705 · 2020-07-01

## TL;DR

This paper analyzes how to optimize performance in deep learning frameworks by exploiting parallelism, providing guidelines that significantly improve training and inference speeds over default settings.

## Contribution

It offers a detailed analysis of framework design features and quantifies parallelism's impact, resulting in practical tuning guidelines for better performance.

## Key findings

- Performance tuning guidelines outperform default settings by over 1.3x
- Parallelism significantly enhances training and inference speeds
- Framework design features have measurable impact on performance

## Abstract

State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning developers. Identifying and using a performance-optimal setting in feature-rich frameworks, however, involves a non-trivial amount of performance profiling efforts and often relies on domain-specific knowledge. This paper takes a deep dive into analyzing the performance impact of key design features in a machine learning framework and quantifies the role of parallelism. The observations and insights distill into a simple set of guidelines that one can use to achieve much higher training and inference speedup. Across a diverse set of real-world deep learning models, the evaluation results show that the proposed performance tuning guidelines outperform the Intel and TensorFlow recommended settings by 1.29x and 1.34x, respectively.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.04705/full.md

## Figures

42 figures with captions in the complete paper: https://tomesphere.com/paper/1908.04705/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/1908.04705/full.md

---
Source: https://tomesphere.com/paper/1908.04705