Spatial Sharing of GPU for Autotuning DNN models
Aditya Dhakal, Junguk Cho, Sameer G. Kulkarni, K. K. Ramakrishnan,, Puneet Sharma

TL;DR
This paper proposes a spatial sharing approach for GPUs to multiplex multiple DNN tuning tasks, significantly improving autotuning efficiency and GPU utilization, leading to faster tuning and higher throughput.
Contribution
It introduces techniques for controlled GPU sharing and model tuning across varying resource levels, reducing autotuning time and enhancing throughput.
Findings
Autotuning time decreased by up to 75%.
Throughput increased by a factor of 5.
Effective GPU multiplexing improves resource utilization.
Abstract
GPUs are used for training, inference, and tuning the machine learning models. However, Deep Neural Network (DNN) vary widely in their ability to exploit the full power of high-performance GPUs. Spatial sharing of GPU enables multiplexing several DNNs on the GPU and can improve GPU utilization, thus improving throughput and lowering latency. DNN models given just the right amount of GPU resources can still provide low inference latency, just as much as dedicating all of the GPU for their inference task. An approach to improve DNN inference is tuning of the DNN model. Autotuning frameworks find the optimal low-level implementation for a certain target device based on the trained machine learning model, thus reducing the DNN's inference latency and increasing inference throughput. We observe an interdependency between the tuned model and its inference latency. A DNN model tuned with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Ferroelectric and Negative Capacitance Devices
