Reward Shaping for User Satisfaction in a REINFORCE Recommender

Konstantina Christakopoulou; Can Xu; Sai Zhang; Sriraj Badam; Trevor; Potter; Daniel Li; Hao Wan; Xinyang Yi; Ya Le; Chris Berg; Eric Bencomo; Dixon; Ed H. Chi; Minmin Chen

arXiv:2209.15166·cs.IR·October 3, 2022·1 cites

Reward Shaping for User Satisfaction in a REINFORCE Recommender

Konstantina Christakopoulou, Can Xu, Sai Zhang, Sriraj Badam, Trevor, Potter, Daniel Li, Hao Wan, Xinyang Yi, Ya Le, Chris Berg, Eric Bencomo, Dixon, Ed H. Chi, Minmin Chen

PDF

Open Access

TL;DR

This paper proposes a reward shaping method in reinforcement learning recommenders that uses satisfaction measurement and imputation to improve user satisfaction, validated through offline and live experiments.

Contribution

It introduces a joint learning framework combining a satisfaction imputation network with a REINFORCE-based policy for better user satisfaction in recommendations.

Findings

01

Imputation models effectively predict satisfaction for unobserved interactions.

02

Reward shaping with satisfaction signals improves recommendation satisfaction.

03

Live experiments show increased user satisfaction in industrial settings.

Abstract

How might we design Reinforcement Learning (RL)-based recommenders that encourage aligning user trajectories with the underlying user satisfaction? Three research questions are key: (1) measuring user satisfaction, (2) combatting sparsity of satisfaction signals, and (3) adapting the training of the recommender agent to maximize satisfaction. For measurement, it has been found that surveys explicitly asking users to rate their experience with consumed items can provide valuable orthogonal information to the engagement/interaction data, acting as a proxy to the underlying user satisfaction. For sparsity, i.e, only being able to observe how satisfied users are with a tiny fraction of user-item interactions, imputation models can be useful in predicting satisfaction level for all items users have consumed. For learning satisfying recommender policies, we postulate that reward shaping in RL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Behavioral Health and Interventions · Emotion and Mood Recognition

MethodsREINFORCE