Listwise Preference Alignment Optimization for Tail Item Recommendation
Zihao Li, Chao Yang, Tong Zhang, Yakun Chen, Xianzhi Wang, Guandong Xu, Daoyi Dong

TL;DR
This paper introduces LPO4Rec, a listwise preference alignment method for tail-item recommendation that improves training efficiency and effectiveness, outperforming baselines and reducing memory usage.
Contribution
It extends the Bradley-Terry model to listwise comparison, derives an optimal policy for efficient training, and introduces strategies to enhance tail-item recommendation performance.
Findings
Outperforms 10 baselines with up to 50% performance gain.
Reduces GPU memory usage by 17.9% compared to DPO.
Effectively improves tail-item recommendation quality.
Abstract
Preference alignment has achieved greater success on Large Language Models (LLMs) and drawn broad interest in recommendation research. Existing preference alignment methods for recommendation either require explicit reward modeling or only support pairwise preference comparison. The former directly increases substantial computational costs, while the latter hinders training efficiency on negative samples. Moreover, no existing effort has explored preference alignment solutions for tail-item recommendation. To bridge the above gaps, we propose LPO4Rec, which extends the Bradley-Terry model from pairwise comparison to listwise comparison, to improve the efficiency of model training. Specifically, we derive a closed form optimal policy to enable more efficient and effective training without explicit reward modeling. We also present an adaptive negative sampling and reweighting strategy to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
