PM2Lat: Highly Accurate and Generalized Prediction of DNN Execution Latency on GPUs
Truong-Thanh Le, Hoang-Loc La, Amir Taherkordi, Frank Eliassen, Phuong Hoai Ha and, Peiyuan Guan

TL;DR
PM2Lat is a novel framework that accurately predicts GPU DNN execution latency by modeling kernel behaviors, outperforming existing methods across diverse models and hardware platforms with low prediction errors.
Contribution
It introduces a kernel-aware GPU latency prediction framework that generalizes to complex kernels and achieves superior accuracy over prior approaches.
Findings
Prediction error below 10% across data types and hardware.
Outperforms NeuSight by 10-20% for FP32 and 50% for BF16.
Maintains 3-8% error on diverse GPU kernels.
Abstract
We present PM2Lat, a fast and generalized framework for accurately predicting the latency of deep neural network models on GPUs, with special focus on NVIDIA. Unlike prior methods that rely on deep learning models or handcrafted heuristics, PM2Lat leverages the Single-Instruction-Multiple-Thread architecture of GPUs to model execution time of DNN models. First, we dive into fine-grained GPU operation modeling by studying computational behavior and memory access patterns. After identifying these characteristics, we found that different GPU kernels exhibit significant performance disparities, even when serving the same purpose. Hence, the core idea of PM2Lat is to differentiate kernels based on their configurations and analyze them accordingly. This kernel-aware modeling enables PM2Lat to achieve consistently low prediction error across diverse data types and hardware platforms. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Big Data and Digital Economy
