UpDLRM: Accelerating Personalized Recommendation using Real-World PIM   Architecture

Sitian Chen; Haobin Tan; Amelie Chi Zhou; Yusen Li; Pavan Balaji

arXiv:2406.13941·cs.IR·October 10, 2024

UpDLRM: Accelerating Personalized Recommendation using Real-World PIM Architecture

Sitian Chen, Haobin Tan, Amelie Chi Zhou, Yusen Li, Pavan Balaji

PDF

TL;DR

UpDLRM leverages real-world PIM hardware to significantly accelerate deep learning recommendation models by enhancing memory bandwidth and reducing inference latency, outperforming CPU and GPU solutions.

Contribution

The paper introduces UpDLRM, a novel approach that utilizes PIM hardware to optimize embedding lookup performance in DLRMs, addressing memory bottlenecks.

Findings

01

UpDLRM achieves lower inference times compared to CPU-only systems.

02

Utilizes PIM hardware to improve memory bandwidth for recommendation models.

03

Effective embedding table partitioning enhances workload balance and caching.

Abstract

Deep Learning Recommendation Models (DLRMs) have gained popularity in recommendation systems due to their effectiveness in handling large-scale recommendation tasks. The embedding layers of DLRMs have become the performance bottleneck due to their intensive needs on memory capacity and memory bandwidth. In this paper, we propose UpDLRM, which utilizes real-world processingin-memory (PIM) hardware, UPMEM DPU, to boost the memory bandwidth and reduce recommendation latency. The parallel nature of the DPU memory can provide high aggregated bandwidth for the large number of irregular memory accesses in embedding lookups, thus offering great potential to reduce the inference latency. To fully utilize the DPU memory bandwidth, we further studied the embedding table partitioning problem to achieve good workload-balance and efficient data caching. Evaluations using real-world datasets show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.