BagPipe: Accelerating Deep Recommendation Model Training

Saurabh Agarwal; Chengpo Yan; Ziyi Zhang; Shivaram Venkataraman

arXiv:2202.12429·cs.DC·November 2, 2023

BagPipe: Accelerating Deep Recommendation Model Training

Saurabh Agarwal, Chengpo Yan, Ziyi Zhang, Shivaram Venkataraman

PDF

Open Access

TL;DR

BagPipe introduces caching and lookahead techniques to accelerate deep recommendation model training, reducing training time significantly while maintaining accuracy and reproducibility.

Contribution

This paper presents BagPipe, a novel system that leverages lookahead embedding access patterns and caching strategies to speed up DLRM training.

Findings

01

Achieves up to 5.6x speedup over baselines

02

Reduces synchronization overheads in distributed training

03

Maintains convergence and reproducibility guarantees

Abstract

Deep learning based recommendation models (DLRM) are widely used in several business critical applications. Training such recommendation models efficiently is challenging because they contain billions of embedding-based parameters, leading to significant overheads from embedding access. By profiling existing systems for DLRM training, we observe that around 75\% of the iteration time is spent on embedding access and model synchronization. Our key insight in this paper is that embedding access has a specific structure which can be used to accelerate training. We observe that embedding accesses are heavily skewed, with around 1\% of embeddings representing more than 92\% of total accesses. Further, we observe that during offline training we can lookahead at future batches to determine exactly which embeddings will be needed at what iteration in the future. Based on these insights, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCaching and Content Delivery · Cloud Computing and Resource Management · Stochastic Gradient Optimization Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings