DQRM: Deep Quantized Recommendation Models

Yang Zhou; Zhen Dong; Ellick Chan; Dhiraj Kalamkar; Diana Marculescu,; Kurt Keutzer

arXiv:2410.20046·cs.IR·October 29, 2024

DQRM: Deep Quantized Recommendation Models

Yang Zhou, Zhen Dong, Ellick Chan, Dhiraj Kalamkar, Diana Marculescu,, Kurt Keutzer

PDF

Open Access 1 Repo

TL;DR

This paper introduces DQRM, a quantized recommendation model framework that reduces model size and improves efficiency without sacrificing accuracy, enabling deployment on smaller devices and faster training.

Contribution

It proposes a novel INT4 quantization-aware training method for recommendation models, enhancing efficiency and reducing memory and communication overhead.

Findings

01

Achieved INT4 quantization of DLRM without accuracy loss.

02

Model size reduced to 0.27 GB on Kaggle and 1.57 GB on Terabyte datasets.

03

Outperformed larger FP32 models in accuracy.

Abstract

Large-scale recommendation models are currently the dominant workload for many large Internet companies. These recommenders are characterized by massive embedding tables that are sparsely accessed by the index for user and item features. The size of these 1TB+ tables imposes a severe memory bottleneck for the training and inference of recommendation models. In this work, we propose a novel recommendation framework that is small, powerful, and efficient to run and train, based on the state-of-the-art Deep Learning Recommendation Model (DLRM). The proposed framework makes inference more efficient on the cloud servers, explores the possibility of deploying powerful recommenders on smaller edge devices, and optimizes the workload of the communication overhead in distributed training under the data parallelism settings. Specifically, we show that quantization-aware training (QAT) can impose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

YangZhou08/Deep_Quantized_Recommendation_Model_DQRM
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Image Retrieval and Classification Techniques

MethodsGradient Sparsification