On Softmax Direct Preference Optimization for Recommendation

Yuxin Chen; Junfei Tan; An Zhang; Zhengyi Yang; Leheng Sheng; Enzhi; Zhang; Xiang Wang; Tat-Seng Chua

arXiv:2406.09215·cs.IR·November 8, 2024·3 cites

On Softmax Direct Preference Optimization for Recommendation

Yuxin Chen, Junfei Tan, An Zhang, Zhengyi Yang, Leheng Sheng, Enzhi, Zhang, Xiang Wang, Tat-Seng Chua

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Softmax-DPO, a novel training objective for LM-based recommenders that incorporates multiple negatives and softmax loss to better model user preferences and improve recommendation accuracy.

Contribution

The paper proposes Softmax-DPO, an extension of DPO tailored for language model recommenders, integrating partial rankings and softmax sampling to enhance preference modeling.

Findings

01

S-DPO outperforms existing methods on three real-world datasets.

02

It effectively models user preferences and boosts recommendation performance.

03

S-DPO has an inherent ability to mine hard negatives.

Abstract

Recommender systems aim to predict personalized rankings based on user preference data. With the rise of Language Models (LMs), LM-based recommenders have been widely explored due to their extensive world knowledge and powerful reasoning abilities. Most of the LM-based recommenders convert historical interactions into language prompts, pairing with a positive item as the target response and fine-tuning LM with a language modeling loss. However, the current objective fails to fully leverage preference data and is not optimized for personalized ranking tasks, which hinders the performance of LM-based recommenders. Inspired by the current advancement of Direct Preference Optimization (DPO) in human preference alignment and the success of softmax loss in recommendations, we propose Softmax-DPO (S-DPO) to instill ranking information into the LM to help LM-based recommenders distinguish…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chenyuxin1999/s-dpo
pytorchOfficial

Videos

On Softmax Direct Preference Optimization for Recommendation· slideslive

Taxonomy

TopicsRecommender Systems and Techniques

MethodsDirect Preference Optimization · Softmax