SPRec: Self-Play to Debias LLM-based Recommendation
Chongming Gao, Ruijun Chen, Shuai Yuan, Kexin Huang, Yuanqing Yu,, Xiangnan He

TL;DR
SPRec introduces a self-play framework combining supervised fine-tuning and preference optimization to reduce bias and improve fairness in LLM-based recommendation systems, outperforming existing methods without extra data.
Contribution
The paper proposes SPRec, a novel self-play approach that mitigates bias and enhances fairness in LLM recommendation models through iterative fine-tuning and preference optimization.
Findings
SPRec improves recommendation accuracy on multiple datasets.
SPRec enhances fairness by reducing over-recommendation bias.
The method outperforms traditional fine-tuning and DPO approaches.
Abstract
Large language models (LLMs) have attracted significant attention in recommendation systems. Current work primarily applies supervised fine-tuning (SFT) to adapt the model for recommendation tasks. However, SFT on positive examples only limits the model's ability to align with user preference. To address this, researchers recently introduced Direct Preference Optimization (DPO), which explicitly aligns LLMs with user preferences using offline preference ranking data. However, we found that DPO inherently biases the model towards a few items, exacerbating the filter bubble issue and ultimately degrading user experience. In this paper, we propose SPRec, a novel self-play framework designed to mitigate over-recommendation and improve fairness without requiring additional data or manual intervention. In each self-play iteration, the model undergoes an SFT step followed by a DPO step,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Topic Modeling · Mental Health via Writing
MethodsSoftmax · Attention Is All You Need · ALIGN · Direct Preference Optimization · Shrink and Fine-Tune
