DQRM: Deep Quantized Recommendation Models
Yang Zhou, Zhen Dong, Ellick Chan, Dhiraj Kalamkar, Diana Marculescu,, Kurt Keutzer

TL;DR
This paper introduces DQRM, a quantized recommendation model framework that reduces model size and improves efficiency without sacrificing accuracy, enabling deployment on smaller devices and faster training.
Contribution
It proposes a novel INT4 quantization-aware training method for recommendation models, enhancing efficiency and reducing memory and communication overhead.
Findings
Achieved INT4 quantization of DLRM without accuracy loss.
Model size reduced to 0.27 GB on Kaggle and 1.57 GB on Terabyte datasets.
Outperformed larger FP32 models in accuracy.
Abstract
Large-scale recommendation models are currently the dominant workload for many large Internet companies. These recommenders are characterized by massive embedding tables that are sparsely accessed by the index for user and item features. The size of these 1TB+ tables imposes a severe memory bottleneck for the training and inference of recommendation models. In this work, we propose a novel recommendation framework that is small, powerful, and efficient to run and train, based on the state-of-the-art Deep Learning Recommendation Model (DLRM). The proposed framework makes inference more efficient on the cloud servers, explores the possibility of deploying powerful recommenders on smaller edge devices, and optimizes the workload of the communication overhead in distributed training under the data parallelism settings. Specifically, we show that quantization-aware training (QAT) can impose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Image Retrieval and Classification Techniques
MethodsGradient Sparsification
