Learning from Bandit Feedback: An Overview of the State-of-the-art

Olivier Jeunen; Dmytro Mykhaylov; David Rohde; Flavian Vasile,; Alexandre Gilotte; Martin Bompaire

arXiv:1909.08471·cs.IR·September 19, 2019·5 cites

Learning from Bandit Feedback: An Overview of the State-of-the-art

Olivier Jeunen, Dmytro Mykhaylov, David Rohde, Flavian Vasile,, Alexandre Gilotte, Martin Bompaire

PDF

Open Access

TL;DR

This paper reviews and compares various Counterfactual Risk Minimisation methods for learning from bandit feedback in recommender systems, highlighting their differences, similarities, and empirical performance in a simulation environment.

Contribution

It provides the first comprehensive comparison of bandit algorithms in recommender systems, analyzing different off-policy estimators and their empirical effectiveness.

Findings

01

Importance sampling improves policy evaluation accuracy.

02

Variance reduction techniques enhance learning robustness.

03

Empirical results favor certain estimators over others in RecoGym.

Abstract

In machine learning we often try to optimise a decision rule that would have worked well over a historical dataset; this is the so called empirical risk minimisation principle. In the context of learning from recommender system logs, applying this principle becomes a problem because we do not have available the reward of decisions we did not do. In order to handle this "bandit-feedback" setting, several Counterfactual Risk Minimisation (CRM) methods have been proposed in recent years, that attempt to estimate the performance of different policies on historical data. Through importance sampling and various variance reduction techniques, these methods allow more robust learning and inference than classical approaches. It is difficult to accurately estimate the performance of policies that frequently perform actions that were infrequently done in the past and a number of different types of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Recommender Systems and Techniques