RLHF Fine-Tuning of LLMs for Alignment with Implicit User Feedback in Conversational Recommenders

Zhongheng Yang; Aijia Sun; Yushang Zhao; Yinuo Yang; Dannier Li; Chengrui Zhou

arXiv:2508.05289·cs.LG·August 8, 2025

RLHF Fine-Tuning of LLMs for Alignment with Implicit User Feedback in Conversational Recommenders

Zhongheng Yang, Aijia Sun, Yushang Zhao, Yinuo Yang, Dannier Li, Chengrui Zhou

PDF

TL;DR

This paper introduces a reinforcement learning approach using human feedback to fine-tune large language models for conversational recommender systems, effectively aligning recommendations with implicit user signals in multi-turn dialogues.

Contribution

It presents a novel RLHF fine-tuning method that leverages implicit user feedback signals to improve LLM-based conversational recommenders, outperforming traditional supervised fine-tuning.

Findings

01

Enhanced top-k recommendation accuracy

02

Improved coherence and user satisfaction

03

Effective alignment with implicit user signals

Abstract

Conversational recommender systems (CRS) based on Large Language Models (LLMs) need to constantly be aligned to the user preferences to provide satisfying and context-relevant item recommendations. The traditional supervised fine-tuning cannot capture the implicit feedback signal, e.g., dwell time, sentiment polarity, or engagement patterns. In this paper, we share a fine-tuning solution using human feedback reinforcement learning (RLHF) to maximize implied user feedback (IUF) in a multi-turn recommendation context. We specify a reward model $R_{ϕ}$ learnt on weakly-labelled engagement information and maximize user-centric utility by optimizing the foundational LLM M_{\theta} through a proximal policy optimization (PPO) approach. The architecture models conversational state transitions $s_{t} \to a_{t} \to s_{t + 1}$ , where the action $a_{t}$ is associated with LLM-generated item…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.