On Softmax Direct Preference Optimization for Recommendation
Yuxin Chen, Junfei Tan, An Zhang, Zhengyi Yang, Leheng Sheng, Enzhi, Zhang, Xiang Wang, Tat-Seng Chua

TL;DR
This paper introduces Softmax-DPO, a novel training objective for LM-based recommenders that incorporates multiple negatives and softmax loss to better model user preferences and improve recommendation accuracy.
Contribution
The paper proposes Softmax-DPO, an extension of DPO tailored for language model recommenders, integrating partial rankings and softmax sampling to enhance preference modeling.
Findings
S-DPO outperforms existing methods on three real-world datasets.
It effectively models user preferences and boosts recommendation performance.
S-DPO has an inherent ability to mine hard negatives.
Abstract
Recommender systems aim to predict personalized rankings based on user preference data. With the rise of Language Models (LMs), LM-based recommenders have been widely explored due to their extensive world knowledge and powerful reasoning abilities. Most of the LM-based recommenders convert historical interactions into language prompts, pairing with a positive item as the target response and fine-tuning LM with a language modeling loss. However, the current objective fails to fully leverage preference data and is not optimized for personalized ranking tasks, which hinders the performance of LM-based recommenders. Inspired by the current advancement of Direct Preference Optimization (DPO) in human preference alignment and the success of softmax loss in recommendations, we propose Softmax-DPO (S-DPO) to instill ranking information into the LM to help LM-based recommenders distinguish…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsRecommender Systems and Techniques
MethodsDirect Preference Optimization · Softmax
