Understanding and Optimizing Packed Neural Network Training for Hyper-Parameter Tuning
Rui Liu, Sanjay Krishnan, Aaron J. Elmore, Michael J. Franklin

TL;DR
This paper introduces a primitive called 'pack' for jointly training multiple neural networks on a single GPU, significantly improving resource utilization and hyperparameter tuning efficiency.
Contribution
It proposes the 'pack' primitive and provides a comprehensive empirical study demonstrating its benefits and trade-offs in neural network training.
Findings
Packing two models can improve performance by up to 40% per training step.
Pack-aware Hyperband can be up to 2.7x faster than the original.
The effectiveness of packing depends on memory, architecture, and model parameters.
Abstract
As neural networks are increasingly employed in machine learning practice, how to efficiently share limited training resources among a diverse set of model training tasks becomes a crucial issue. To achieve better utilization of the shared resources, we explore the idea of jointly training multiple neural network models on a single GPU in this paper. We realize this idea by proposing a primitive, called pack. We further present a comprehensive empirical study of pack and end-to-end experiments that suggest significant improvements for hyperparameter tuning. The results suggest: (1) packing two models can bring up to 40% performance improvement over unpacked setups for a single training step and the improvement increases when packing more models; (2) the benefit of the pack primitive largely depends on a number of factors including memory capacity, chip architecture, neural network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Convolution · Average Pooling · Concatenated Skip Connection · Global Average Pooling · Dense Block · Kaiming Initialization · 1x1 Convolution · Dropout
