RLHF Fine-Tuning of LLMs for Alignment with Implicit User Feedback in Conversational Recommenders
Zhongheng Yang, Aijia Sun, Yushang Zhao, Yinuo Yang, Dannier Li, Chengrui Zhou

TL;DR
This paper introduces a reinforcement learning approach using human feedback to fine-tune large language models for conversational recommender systems, effectively aligning recommendations with implicit user signals in multi-turn dialogues.
Contribution
It presents a novel RLHF fine-tuning method that leverages implicit user feedback signals to improve LLM-based conversational recommenders, outperforming traditional supervised fine-tuning.
Findings
Enhanced top-k recommendation accuracy
Improved coherence and user satisfaction
Effective alignment with implicit user signals
Abstract
Conversational recommender systems (CRS) based on Large Language Models (LLMs) need to constantly be aligned to the user preferences to provide satisfying and context-relevant item recommendations. The traditional supervised fine-tuning cannot capture the implicit feedback signal, e.g., dwell time, sentiment polarity, or engagement patterns. In this paper, we share a fine-tuning solution using human feedback reinforcement learning (RLHF) to maximize implied user feedback (IUF) in a multi-turn recommendation context. We specify a reward model learnt on weakly-labelled engagement information and maximize user-centric utility by optimizing the foundational LLM M_{\theta} through a proximal policy optimization (PPO) approach. The architecture models conversational state transitions , where the action is associated with LLM-generated item…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
