SHARK: A Lightweight Model Compression Approach for Large-scale Recommender Systems
Beichuan Zhang, Chenggen Sun, Jianchao Tan, Xinjun Cai, Jun Zhao,, Mengqi Miao, Kang Yin, Chengru Song, Na Mou, Yang Song

TL;DR
SHARK is a novel model compression method for large-scale recommender systems that effectively reduces memory and computational costs while maintaining or improving performance, suitable for industrial deployment.
Contribution
The paper introduces SHARK, combining Taylor expansion-based importance pruning and row-wise quantization, advancing model compression techniques for recommender systems.
Findings
Achieves 70% storage reduction without performance loss in short-video models.
Improves query processing speed by 30%.
Successfully deployed in industrial settings serving hundreds of millions of users.
Abstract
Increasing the size of embedding layers has shown to be effective in improving the performance of recommendation models, yet gradually causing their sizes to exceed terabytes in industrial recommender systems, and hence the increase of computing and storage costs. To save resources while maintaining model performances, we propose SHARK, the model compression practice we have summarized in the recommender system of industrial scenarios. SHARK consists of two main components. First, we use the novel first-order component of Taylor expansion as importance scores to prune the number of embedding tables (feature fields). Second, we introduce a new row-wise quantization method to apply different quantization strategies to each embedding. We conduct extensive experiments on both public and industrial datasets, demonstrating that each component of our proposed SHARK framework outperforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Image Retrieval and Classification Techniques · Advanced Data Compression Techniques
