Boosting Parameter Efficiency in LLM-Based Recommendation through Sophisticated Pruning
Shanle Zheng, Keqin Bao, Jizhi Zhang, Yang Zhang, Fuli Feng, Xiangnan He

TL;DR
This paper introduces a sophisticated, multi-stage parameter pruning method for large language model-based recommender systems, significantly reducing model size while maintaining high recommendation quality.
Contribution
It uncovers intra-layer redundancy in LLM components and proposes a fine-grained, three-stage pruning strategy with performance restoration, advancing parameter efficiency in recommendation models.
Findings
Achieves 88% of original performance after pruning over 95% of parameters.
Effectively reduces resource requirements for LLM-based recommenders.
Demonstrates robustness across three datasets.
Abstract
LLM-based recommender systems have made significant progress; however, the deployment cost associated with the large parameter volume of LLMs still hinders their real-world applications. This work explores parameter pruning to improve parameter efficiency while maintaining recommendation quality, thereby enabling easier deployment. Unlike existing approaches that focus primarily on inter-layer redundancy, we uncover intra-layer redundancy within components such as self-attention and MLP modules. Building on this analysis, we propose a more fine-grained pruning approach that integrates both intra-layer and layer-wise pruning. Specifically, we introduce a three-stage pruning strategy that progressively prunes parameters at different levels and parts of the model, moving from intra-layer to layer-wise pruning, or from width to depth. Each stage also includes a performance restoration step…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Advanced Neural Network Applications · Explainable Artificial Intelligence (XAI)
MethodsPruning · Focus
