Counterfactual Risk Minimization: Learning from Logged Bandit Feedback

Adith Swaminathan; Thorsten Joachims

arXiv:1502.02362·cs.LG·May 22, 2015·125 cites

Counterfactual Risk Minimization: Learning from Logged Bandit Feedback

Adith Swaminathan, Thorsten Joachims

PDF

Open Access

TL;DR

This paper introduces the Counterfactual Risk Minimization principle and the POEM algorithm for effective batch learning from logged bandit feedback, improving robustness and generalization in structured output prediction.

Contribution

It develops a new learning framework using propensity scoring and error bounds, leading to the POEM algorithm for stochastic linear rule learning.

Findings

01

POEM outperforms state-of-the-art methods in multi-label classification.

02

The approach provides robust generalization bounds.

03

Efficient stochastic gradient optimization is enabled by the decomposition of the POEM objective.

Abstract

We develop a learning principle and an efficient algorithm for batch learning from logged bandit feedback. This learning setting is ubiquitous in online systems (e.g., ad placement, web search, recommendation), where an algorithm makes a prediction (e.g., ad ranking) for a given input (e.g., query) and observes bandit feedback (e.g., user clicks on presented ads). We first address the counterfactual nature of the learning problem through propensity scoring. Next, we prove generalization error bounds that account for the variance of the propensity-weighted empirical risk estimator. These constructive bounds give rise to the Counterfactual Risk Minimization (CRM) principle. We show how CRM can be used to derive a new learning method -- called Policy Optimizer for Exponential Models (POEM) -- for learning stochastic linear rules for structured output prediction. We present a decomposition…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Machine Learning and Data Classification