TBD: Benchmarking and Analyzing Deep Neural Network Training
Hongyu Zhu, Mohamed Akrout, Bojian Zheng, Andrew Pelegris, Amar, Phanishayee, Bianca Schroeder, and Gennady Pekhimenko

TL;DR
This paper introduces a comprehensive benchmark called TBD for evaluating DNN training across various models and applications, along with tools for performance and memory analysis on multiple frameworks and hardware setups.
Contribution
It proposes a new broad benchmark for DNN training covering diverse applications and provides a detailed performance and memory analysis toolkit for major frameworks and hardware configurations.
Findings
Identified key bottlenecks in DNN training performance.
Provided insights into memory consumption patterns during training.
Recommended optimization directions for future DNN training research.
Abstract
The recent popularity of deep neural networks (DNNs) has generated a lot of research interest in performing DNN-related computation efficiently. However, the primary focus is usually very narrow and limited to (i) inference -- i.e. how to efficiently execute already trained models and (ii) image classification networks as the primary benchmark for evaluation. Our primary goal in this work is to break this myopic view by (i) proposing a new benchmark for DNN training, called TBD (TBD is short for Training Benchmark for DNNs), that uses a representative set of DNN models that cover a wide range of machine learning applications: image classification, machine translation, speech recognition, object detection, adversarial networks, reinforcement learning, and (ii) by performing an extensive performance analysis of training these different applications on three major deep learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
