Fine-Tuning Diffusion-Based Recommender Systems via Reinforcement Learning with Reward Function Optimization
Yu Hou, Hua Li, Ha Young Kim, Won-Yong Shin

TL;DR
ReFiT introduces a reinforcement learning-based fine-tuning framework for diffusion recommender systems, improving recommendation quality and efficiency by directly optimizing a task-specific reward function.
Contribution
It presents a novel RL fine-tuning method that formulates diffusion model optimization as an MDP with a collaborative reward, enhancing recommendation performance.
Findings
Up to 36.3% performance improvement over competitors
Linear complexity in users and items for efficiency
Effective across multiple diffusion recommendation scenarios
Abstract
Diffusion models recently emerged as a powerful paradigm for recommender systems, offering state-of-the-art performance by modeling the generative process of user-item interactions. However, training such models from scratch is both computationally expensive and yields diminishing returns once convergence is reached. To remedy these challenges, we propose ReFiT, a new framework that integrates Reinforcement learning (RL)-based Fine-Tuning into diffusion-based recommender systems. In contrast to prior RL approaches for diffusion models depending on external reward models, ReFiT adopts a task-aligned design: it formulates the denoising trajectory as a Markov decision process (MDP) and incorporates a collaborative signal-aware reward function that directly reflects recommendation quality. By tightly coupling the MDP structure with this reward signal, ReFiT empowers the RL agent to exploit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Advanced Technologies in Various Fields · Advanced Bandit Algorithms Research
