The trade-offs of model size in large recommendation models : A 10000   $\times$ compressed criteo-tb DLRM model (100 GB parameters to mere 10MB)

Aditya Desai; Anshumali Shrivastava

arXiv:2207.10731·cs.LG·July 25, 2022

The trade-offs of model size in large recommendation models : A 10000 $\times$ compressed criteo-tb DLRM model (100 GB parameters to mere 10MB)

Aditya Desai, Anshumali Shrivastava

PDF

Open Access

TL;DR

This paper presents a method to compress large recommendation models, specifically DLRMs, by 10,000 times using parameter sharing, significantly reducing memory and inference costs while maintaining model quality, but with increased training iterations.

Contribution

It introduces a generic parameter sharing setup for DLRMs, providing theoretical bounds and demonstrating 10,000× compression without quality loss, and analyzes the tradeoffs involved.

Findings

01

Achieved 10,000× compression of DLRM on criteo-tb dataset.

02

Small compressed models enable 4.3× faster training latency.

03

Tradeoff exists between slower convergence and system benefits of smaller models.

Abstract

Embedding tables dominate industrial-scale recommendation model sizes, using up to terabytes of memory. A popular and the largest publicly available machine learning MLPerf benchmark on recommendation data is a Deep Learning Recommendation Model (DLRM) trained on a terabyte of click-through data. It contains 100GB of embedding memory (25+Billion parameters). DLRMs, due to their sheer size and the associated volume of data, face difficulty in training, deploying for inference, and memory bottlenecks due to large embedding tables. This paper analyzes and extensively evaluates a generic parameter sharing setup (PSS) for compressing DLRM models. We show theoretical upper bounds on the learnable memory requirements for achieving $(1 \pm ϵ)$ approximations to the embedding table. Our bounds indicate exponentially fewer parameters suffice for good accuracy. To this end, we demonstrate a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Stochastic Gradient Optimization Techniques · Topic Modeling