Building a Performance Model for Deep Learning Recommendation Model   Training on GPUs

Zhongyi Lin; Louis Feng; Ehsan K. Ardestani; Jaewon Lee and; John Lundell; Changkyu Kim; Arun Kejariwal; John D. Owens

arXiv:2201.07821·cs.LG·November 18, 2022

Building a Performance Model for Deep Learning Recommendation Model Training on GPUs

Zhongyi Lin, Louis Feng, Ehsan K. Ardestani, Jaewon Lee and, John Lundell, Changkyu Kim, Arun Kejariwal, John D. Owens

PDF

Open Access 1 Repo

TL;DR

This paper presents a performance model for GPU training of Deep Learning Recommendation Models, accurately predicting training times by modeling kernel performance and operator overheads, enabling better system design.

Contribution

The paper introduces a novel performance modeling approach combining heuristic and ML-based kernel models with operator overhead categorization for DLRM training on GPUs.

Findings

01

Achieves less than 10% error in kernel performance prediction

02

Predicts GPU active time and total training time with around 5-8% error

03

Demonstrates applicability to various ML models beyond DLRM

Abstract

We devise a performance model for GPU training of Deep Learning Recommendation Models (DLRM), whose GPU utilization is low compared to other well-optimized CV and NLP models. We show that both the device active time (the sum of kernel runtimes) but also the device idle time are important components of the overall device time. We therefore tackle them separately by (1) flexibly adopting heuristic-based and ML-based kernel performance models for operators that dominate the device active time, and (2) categorizing operator overheads into five types to determine quantitatively their contribution to the device active time. Combining these two parts, we propose a critical-path-based algorithm to predict the per-batch training time of DLRM by traversing its execution graph. We achieve less than 10% geometric mean average error (GMAE) in all kernel performance modeling, and 4.61% and 7.96%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

owensgroup/ml_perf_model
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Recommender Systems and Techniques