Merlin HugeCTR: GPU-accelerated Recommender System Training and   Inference

Joey Wang; Yingcan Wei; Minseok Lee; Matthias Langer; Fan Yu; Jie Liu,; Alex Liu; Daniel Abel; Gems Guo; Jianbing Dong; Jerry Shi; Kunlun Li

arXiv:2210.08803·cs.DC·October 18, 2022

Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference

Joey Wang, Yingcan Wei, Minseok Lee, Matthias Langer, Fan Yu, Jie Liu,, Alex Liu, Daniel Abel, Gems Guo, Jianbing Dong, Jerry Shi, Kunlun Li

PDF

TL;DR

Merlin HugeCTR is an open-source GPU-accelerated framework that significantly speeds up training and inference of recommendation models, enabling scalable, low-latency online deployment with high performance on single and multi-node systems.

Contribution

It introduces a novel GPU-optimized architecture with hierarchical storage and a parameter server, achieving substantial speedups over CPU-based methods for recommendation systems.

Findings

01

Up to 24.6x speedup on MLPerf benchmark with a single DGX A100

02

5-62x inference speedup over CPU implementations using hierarchical parameter server

03

Effective multi-node training acceleration for large-scale recommendation models

Abstract

In this talk, we introduce Merlin HugeCTR. Merlin HugeCTR is an open source, GPU-accelerated integration framework for click-through rate estimation. It optimizes both training and inference, whilst enabling model training at scale with model-parallel embeddings and data-parallel neural networks. In particular, Merlin HugeCTR combines a high-performance GPU embedding cache with an hierarchical storage architecture, to realize low-latency retrieval of embeddings for online model inference tasks. In the MLPerf v1.0 DLRM model training benchmark, Merlin HugeCTR achieves a speedup of up to 24.6x on a single DGX A100 (8x A100) over PyTorch on 4x4-socket CPU nodes (4x4x28 cores). Merlin HugeCTR can also take advantage of multi-node environments to accelerate training even further. Since late 2021, Merlin HugeCTR additionally features a hierarchical parameter server (HPS) and supports…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.