DIPPM: a Deep Learning Inference Performance Predictive Model using Graph Neural Networks
Karthick Panner Selvam, Mats Brorsson

TL;DR
DIPPM is a deep learning inference performance prediction model using graph neural networks that accurately estimates latency, energy, and memory usage across various frameworks and suggests optimal GPU configurations.
Contribution
The paper introduces DIPPM, a novel GNN-based model that predicts DL inference performance metrics and recommends GPU profiles, enabling efficient hardware utilization and rapid design-space exploration.
Findings
Achieved a MAPE of 1.9% in performance prediction.
Built a dataset of 10,508 DL models for training and evaluation.
Demonstrated effective cross-framework model parsing.
Abstract
Deep Learning (DL) has developed to become a corner-stone in many everyday applications that we are now relying on. However, making sure that the DL model uses the underlying hardware efficiently takes a lot of effort. Knowledge about inference characteristics can help to find the right match so that enough resources are given to the model, but not too much. We have developed a DL Inference Performance Predictive Model (DIPPM) that predicts the inference latency, energy, and memory usage of a given input DL model on the NVIDIA A100 GPU. We also devised an algorithm to suggest the appropriate A100 Multi-Instance GPU profile from the output of DIPPM. We developed a methodology to convert DL models expressed in multiple frameworks to a generalized graph structure that is used in DIPPM. It means DIPPM can parse input DL models from various frameworks. Our DIPPM can be used not only helps to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Advanced Graph Neural Networks · Advanced Memory and Neural Computing
