A Frequency-aware Software Cache for Large Recommendation System   Embeddings

Jiarui Fang; Geng Zhang; Jiatong Han; Shenggui Li; Zhengda; Bian; Yongbin Li; Jin Liu; Yang You

arXiv:2208.05321·cs.IR·August 11, 2022·1 cites

A Frequency-aware Software Cache for Large Recommendation System Embeddings

Jiarui Fang, Geng Zhang, Jiatong Han, Shenggui Li, Zhengda, Bian, Yongbin Li, Jin Liu, Yang You

PDF

Open Access 1 Repo

TL;DR

This paper introduces a frequency-aware software cache for large recommendation system embeddings, enabling efficient GPU training by dynamically managing embedding data between CPU and GPU memory.

Contribution

It presents a novel GPU-based cache approach that leverages frequency statistics to optimize embedding management in DLRMs, scalable to multiple GPUs.

Findings

01

Maintains only 1.5% of embeddings in GPU for effective training.

02

Achieves efficient training speed with minimal GPU memory usage.

03

Supports synchronized updates and multi-GPU scaling.

Abstract

Deep learning recommendation models (DLRMs) have been widely applied in Internet companies. The embedding tables of DLRMs are too large to fit on GPU memory entirely. We propose a GPU-based software cache approaches to dynamically manage the embedding table in the CPU and GPU memory space by leveraging the id's frequency statistics of the target dataset. Our proposed software cache is efficient in training entire DLRMs on GPU in a synchronized update manner. It is also scaled to multiple GPUs in combination with the widely used hybrid parallel training approaches. Evaluating our prototype system shows that we can keep only 1.5% of the embedding parameters in the GPU to obtain a decent end-to-end training speed.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zxgx/freqcacheembedding
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Stochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis