Building a Performance Model for Deep Learning Recommendation Model Training on GPUs
Zhongyi Lin, Louis Feng, Ehsan K. Ardestani, Jaewon Lee and, John Lundell, Changkyu Kim, Arun Kejariwal, John D. Owens

TL;DR
This paper presents a performance model for GPU training of Deep Learning Recommendation Models, accurately predicting training times by modeling kernel performance and operator overheads, enabling better system design.
Contribution
The paper introduces a novel performance modeling approach combining heuristic and ML-based kernel models with operator overhead categorization for DLRM training on GPUs.
Findings
Achieves less than 10% error in kernel performance prediction
Predicts GPU active time and total training time with around 5-8% error
Demonstrates applicability to various ML models beyond DLRM
Abstract
We devise a performance model for GPU training of Deep Learning Recommendation Models (DLRM), whose GPU utilization is low compared to other well-optimized CV and NLP models. We show that both the device active time (the sum of kernel runtimes) but also the device idle time are important components of the overall device time. We therefore tackle them separately by (1) flexibly adopting heuristic-based and ML-based kernel performance models for operators that dominate the device active time, and (2) categorizing operator overheads into five types to determine quantitatively their contribution to the device active time. Combining these two parts, we propose a critical-path-based algorithm to predict the per-batch training time of DLRM by traversing its execution graph. We achieve less than 10% geometric mean average error (GMAE) in all kernel performance modeling, and 4.61% and 7.96%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Recommender Systems and Techniques
