Direct Preference Optimization for LLM-Enhanced Recommendation Systems

Chao Sun; Yaobo Liang; Yaming Yang; Shilin Xu; Tianmeng Yang; Yunhai; Tong

arXiv:2410.05939·cs.IR·April 3, 2025

Direct Preference Optimization for LLM-Enhanced Recommendation Systems

Chao Sun, Yaobo Liang, Yaming Yang, Shilin Xu, Tianmeng Yang, Yunhai, Tong

PDF

Open Access

TL;DR

This paper introduces DPO4Rec, a novel framework that enhances LLM-based recommendation systems by integrating direct preference optimization, leading to improved re-ranking performance and better alignment with recommendation goals.

Contribution

The paper proposes DPO4Rec, a new method that combines preference inference, reward modeling, and structure alignment to improve LLMs for recommendation tasks.

Findings

01

Significant improvement in re-ranking performance over baselines

02

Enhanced instruction-following capabilities of LLMs in recommendations

03

Effective alignment of LLM outputs with recommendation objectives

Abstract

Large Language Models (LLMs) have exhibited remarkable performance across a wide range of domains, motivating research into their potential for recommendation systems. Early efforts have leveraged LLMs' rich knowledge and strong generalization capabilities via in-context learning, where recommendation tasks are framed as prompts. However, LLM performance in recommendation scenarios remains limited due to the mismatch between their pretraining objectives and recommendation tasks, as well as the lack of recommendation-specific data during pretraining. To address these challenges, we propose DPO4Rec, a novel framework that integrates Direct Preference Optimization (DPO) into LLM-enhanced recommendation systems. First, we prompt the LLM to infer user preferences from historical interactions, which are then used to augment traditional ID-based sequential recommendation models. Next, we train…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Advanced Bandit Algorithms Research · Machine Learning in Healthcare

MethodsDirect Preference Optimization · ALIGN