Alternate Model Growth and Pruning for Efficient Training of Recommendation Systems
Xiaocong Du, Bhargav Bhushanam, Jiecao Yu, Dhruv Choudhary, Tianxiang, Gao, Sherman Wong, Louis Feng, Jongsoo Park, Yu Cao, Arun Kejariwal

TL;DR
This paper introduces a novel dynamic training scheme called alternate model growth and pruning, which reduces training costs of recommendation systems while maintaining model capacity and accuracy.
Contribution
It proposes a new structured sparsification method that alternates between growing and pruning weights during training, enabling efficient large-scale recommendation system training.
Findings
Reduces training cost without sacrificing model capacity.
Effective on open-source and industrial-scale models.
First in-depth analysis of structural dynamics in recommendation systems.
Abstract
Deep learning recommendation systems at scale have provided remarkable gains through increasing model capacity (i.e. wider and deeper neural networks), but it comes at significant training cost and infrastructure cost. Model pruning is an effective technique to reduce computation overhead for deep neural networks by removing redundant parameters. However, modern recommendation systems are still thirsty for model capacity due to the demand for handling big data. Thus, pruning a recommendation model at scale results in a smaller model capacity and consequently lower accuracy. To reduce computation cost without sacrificing model capacity, we propose a dynamic training scheme, namely alternate model growth and pruning, to alternatively construct and prune weights in the course of training. Our method leverages structured sparsification to reduce computational cost without hurting the model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Recommender Systems and Techniques · Stochastic Gradient Optimization Techniques
MethodsPruning
