Value Function Decomposition in Markov Recommendation Process

Xiaobei Wang; Shuchang Liu; Qingpeng Cai; Xiang Li; Lantao Hu; Han li,; Guangming Xie

arXiv:2501.17409·cs.IR·February 4, 2025

Value Function Decomposition in Markov Recommendation Process

Xiaobei Wang, Shuchang Liu, Qingpeng Cai, Xiang Li, Lantao Hu, Han li,, Guangming Xie

PDF

TL;DR

This paper proposes a value function decomposition method for reinforcement learning in recommender systems, improving long-term reward estimation, learning speed, and robustness by disentangling stochastic policy and environment factors.

Contribution

It introduces a novel disentangled learning framework that separately models stochastic policy and user environment factors in value function estimation for better recommendation performance.

Findings

01

Faster convergence in value estimation

02

Enhanced robustness against action exploration

03

Improved long-term reward prediction accuracy

Abstract

Recent advances in recommender systems have shown that user-system interaction essentially formulates long-term optimization problems, and online reinforcement learning can be adopted to improve recommendation performance. The general solution framework incorporates a value function that estimates the user's expected cumulative rewards in the future and guides the training of the recommendation policy. To avoid local maxima, the policy may explore potential high-quality actions during inference to increase the chance of finding better future rewards. To accommodate the stepwise recommendation process, one widely adopted approach to learning the value function is learning from the difference between the values of two consecutive states of a user. However, we argue that this paradigm involves a challenge of Mixing Random Factors: there exist two random factors from the stochastic policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.