Counterfactual Risk Minimization: Learning from Logged Bandit Feedback
Adith Swaminathan, Thorsten Joachims

TL;DR
This paper introduces the Counterfactual Risk Minimization principle and the POEM algorithm for effective batch learning from logged bandit feedback, improving robustness and generalization in structured output prediction.
Contribution
It develops a new learning framework using propensity scoring and error bounds, leading to the POEM algorithm for stochastic linear rule learning.
Findings
POEM outperforms state-of-the-art methods in multi-label classification.
The approach provides robust generalization bounds.
Efficient stochastic gradient optimization is enabled by the decomposition of the POEM objective.
Abstract
We develop a learning principle and an efficient algorithm for batch learning from logged bandit feedback. This learning setting is ubiquitous in online systems (e.g., ad placement, web search, recommendation), where an algorithm makes a prediction (e.g., ad ranking) for a given input (e.g., query) and observes bandit feedback (e.g., user clicks on presented ads). We first address the counterfactual nature of the learning problem through propensity scoring. Next, we prove generalization error bounds that account for the variance of the propensity-weighted empirical risk estimator. These constructive bounds give rise to the Counterfactual Risk Minimization (CRM) principle. We show how CRM can be used to derive a new learning method -- called Policy Optimizer for Exponential Models (POEM) -- for learning stochastic linear rules for structured output prediction. We present a decomposition…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Machine Learning and Data Classification
