MiniRec: Data-Efficient Reinforcement Learning for LLM-based Recommendation
Lin Wang, Yang Zhang, Jingfan Chen, Xiaoyan Zhao, Fengbin Zhu, Qing Li, Tat-Seng Chua

TL;DR
MiniRec introduces a reward-based, trajectory-aligned data selection method for RL-enhanced LLM recommendation systems, significantly reducing training costs while maintaining high performance.
Contribution
It presents MiniRec, a novel data selection framework that aligns sample choice with RL signals and optimization trajectories, improving efficiency in RL-based LLM recommendation.
Findings
Reduces training cost by up to 50%
Maintains recommendation performance with fewer samples
Highlights importance of reward-aligned data selection
Abstract
The integration of reinforcement learning (RL) into large language models (LLMs) has opened new opportunities for recommender systems by eliciting reasoning and improving user preference modeling. However, RL-based LLM recommendation faces significant efficiency challenges, making full-data training costly. Existing data selection methods define sample value based on learnability or representativeness, yet their loss- or gradient-driven or dataset coverage-driven criteria often misalign with RL learning dynamics, resulting in suboptimal performance. To address this, we propose MiniRec, a data selection framework tailored for RL-based LLM recommendation. MiniRec evaluates sample learnability using key RL signals -- rewards -- pruning samples that are too easy (too high reward) or too difficult (consistently low reward). It assesses representativeness by aligning sample gradients with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Topic Modeling · Explainable Artificial Intelligence (XAI)
