Forecasting GPU Performance for Deep Learning Training and Inference
Seonho Lee, Amar Phanishayee, Divya Mahajan

TL;DR
NeuSight is a performance prediction framework for deep learning workloads on GPUs that accurately estimates training and inference latency on unseen hardware, outperforming prior models by decomposing predictions into smaller, tile-based components.
Contribution
Introduces NeuSight, a novel GPU performance prediction framework that leverages hardware behavior and software optimizations, using a tile-based decomposition approach for improved accuracy on unseen models and GPUs.
Findings
NeuSight reduces latency prediction error for GPT-3 on H100 from over 120% to 2.3%.
The framework outperforms prior regression and neural network models across various workloads.
NeuSight generalizes well to unseen models and hardware, demonstrating high prediction accuracy.
Abstract
Deep learning kernels exhibit predictable memory accesses and compute patterns, making GPUs' parallel architecture well-suited for their execution. Software and runtime systems for GPUs are optimized to better utilize the stream multiprocessors, on-chip cache, and off-chip high-bandwidth memory. As deep learning models and GPUs evolve, access to newer GPUs is often limited, raising questions about the performance of new model architectures on existing GPUs, existing models on new GPUs, and new model architectures on new GPUs. To address these questions, we introduce NeuSight, a framework to predict the performance of various deep learning models, for both training and inference, on unseen GPUs without requiring actual execution. The framework leverages both GPU hardware behavior and software library optimizations to estimate end-to-end performance. Previous work uses regression models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Advanced Neural Network Applications
MethodsLib
