Fine-Tuning Diffusion-Based Recommender Systems via Reinforcement Learning with Reward Function Optimization

Yu Hou; Hua Li; Ha Young Kim; Won-Yong Shin

arXiv:2511.06937·cs.IR·November 11, 2025

Fine-Tuning Diffusion-Based Recommender Systems via Reinforcement Learning with Reward Function Optimization

Yu Hou, Hua Li, Ha Young Kim, Won-Yong Shin

PDF

Open Access

TL;DR

ReFiT introduces a reinforcement learning-based fine-tuning framework for diffusion recommender systems, improving recommendation quality and efficiency by directly optimizing a task-specific reward function.

Contribution

It presents a novel RL fine-tuning method that formulates diffusion model optimization as an MDP with a collaborative reward, enhancing recommendation performance.

Findings

01

Up to 36.3% performance improvement over competitors

02

Linear complexity in users and items for efficiency

03

Effective across multiple diffusion recommendation scenarios

Abstract

Diffusion models recently emerged as a powerful paradigm for recommender systems, offering state-of-the-art performance by modeling the generative process of user-item interactions. However, training such models from scratch is both computationally expensive and yields diminishing returns once convergence is reached. To remedy these challenges, we propose ReFiT, a new framework that integrates Reinforcement learning (RL)-based Fine-Tuning into diffusion-based recommender systems. In contrast to prior RL approaches for diffusion models depending on external reward models, ReFiT adopts a task-aligned design: it formulates the denoising trajectory as a Markov decision process (MDP) and incorporates a collaborative signal-aware reward function that directly reflects recommendation quality. By tightly coupling the MDP structure with this reward signal, ReFiT empowers the RL agent to exploit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Advanced Technologies in Various Fields · Advanced Bandit Algorithms Research