Boosting Parameter Efficiency in LLM-Based Recommendation through Sophisticated Pruning

Shanle Zheng; Keqin Bao; Jizhi Zhang; Yang Zhang; Fuli Feng; Xiangnan He

arXiv:2507.07064·cs.IR·July 10, 2025

Boosting Parameter Efficiency in LLM-Based Recommendation through Sophisticated Pruning

Shanle Zheng, Keqin Bao, Jizhi Zhang, Yang Zhang, Fuli Feng, Xiangnan He

PDF

Open Access 1 Repo

TL;DR

This paper introduces a sophisticated, multi-stage parameter pruning method for large language model-based recommender systems, significantly reducing model size while maintaining high recommendation quality.

Contribution

It uncovers intra-layer redundancy in LLM components and proposes a fine-grained, three-stage pruning strategy with performance restoration, advancing parameter efficiency in recommendation models.

Findings

01

Achieves 88% of original performance after pruning over 95% of parameters.

02

Effectively reduces resource requirements for LLM-based recommenders.

03

Demonstrates robustness across three datasets.

Abstract

LLM-based recommender systems have made significant progress; however, the deployment cost associated with the large parameter volume of LLMs still hinders their real-world applications. This work explores parameter pruning to improve parameter efficiency while maintaining recommendation quality, thereby enabling easier deployment. Unlike existing approaches that focus primarily on inter-layer redundancy, we uncover intra-layer redundancy within components such as self-attention and MLP modules. Building on this analysis, we propose a more fine-grained pruning approach that integrates both intra-layer and layer-wise pruning. Specifically, we introduce a three-stage pruning strategy that progressively prunes parameters at different levels and parts of the model, moving from intra-layer to layer-wise pruning, or from width to depth. Each stage also includes a performance restoration step…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zheng-sl/prunerec
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Advanced Neural Network Applications · Explainable Artificial Intelligence (XAI)

MethodsPruning · Focus