Local Policy Improvement for Recommender Systems

Dawen Liang; Nikos Vlassis

arXiv:2212.11431·cs.LG·April 28, 2023

Local Policy Improvement for Recommender Systems

Dawen Liang, Nikos Vlassis

PDF

Open Access

TL;DR

This paper introduces a local policy improvement method for recommender systems that optimizes a lower bound of expected reward without off-policy correction, suitable for frequent policy updates and data reuse.

Contribution

It proposes a novel local policy improvement approach that avoids importance sampling, enabling more practical and efficient policy updates in recommender systems.

Findings

01

Method effectively improves policies in sequential recommendation tasks.

02

Avoids the practical limitations of importance sampling correction.

03

Provides empirical evidence and practical guidelines for implementation.

Abstract

Recommender systems predict what items a user will interact with next, based on their past interactions. The problem is often approached through supervised learning, but recent advancements have shifted towards policy optimization of rewards (e.g., user engagement). One challenge with the latter is policy mismatch: we are only able to train a new policy given data collected from a previously-deployed policy. The conventional way to address this problem is through importance sampling correction, but this comes with practical limitations. We suggest an alternative approach of local policy improvement without off-policy correction. Our method computes and optimizes a lower bound of expected reward of the target policy, which is easy to estimate from data and does not involve density ratios (such as those appearing in importance sampling correction). This local policy improvement paradigm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Advanced Bandit Algorithms Research · Smart Grid Energy Management