Direct Preference Optimization for LLM-Enhanced Recommendation Systems
Chao Sun, Yaobo Liang, Yaming Yang, Shilin Xu, Tianmeng Yang, Yunhai, Tong

TL;DR
This paper introduces DPO4Rec, a novel framework that enhances LLM-based recommendation systems by integrating direct preference optimization, leading to improved re-ranking performance and better alignment with recommendation goals.
Contribution
The paper proposes DPO4Rec, a new method that combines preference inference, reward modeling, and structure alignment to improve LLMs for recommendation tasks.
Findings
Significant improvement in re-ranking performance over baselines
Enhanced instruction-following capabilities of LLMs in recommendations
Effective alignment of LLM outputs with recommendation objectives
Abstract
Large Language Models (LLMs) have exhibited remarkable performance across a wide range of domains, motivating research into their potential for recommendation systems. Early efforts have leveraged LLMs' rich knowledge and strong generalization capabilities via in-context learning, where recommendation tasks are framed as prompts. However, LLM performance in recommendation scenarios remains limited due to the mismatch between their pretraining objectives and recommendation tasks, as well as the lack of recommendation-specific data during pretraining. To address these challenges, we propose DPO4Rec, a novel framework that integrates Direct Preference Optimization (DPO) into LLM-enhanced recommendation systems. First, we prompt the LLM to infer user preferences from historical interactions, which are then used to augment traditional ID-based sequential recommendation models. Next, we train…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Advanced Bandit Algorithms Research · Machine Learning in Healthcare
MethodsDirect Preference Optimization · ALIGN
