Variance-Optimal Augmentation Logging for Counterfactual Evaluation in Contextual Bandits
Aaron David Tucker, Thorsten Joachims

TL;DR
This paper introduces Minimum Variance Augmentation Logging (MVAL), a method for designing data-gathering policies that reduce variance in counterfactual evaluation and learning in contextual bandits, improving over naive approaches.
Contribution
The paper proposes MVAL, a novel approach to construct logging policies that minimize variance in counterfactual evaluation, with efficient computation methods and demonstrated effectiveness.
Findings
MVAL significantly reduces estimator variance compared to naive methods.
Efficient algorithms for computing MVAL policies are developed.
MVAL improves the reliability of offline evaluation in contextual bandits.
Abstract
Methods for offline A/B testing and counterfactual learning are seeing rapid adoption in search and recommender systems, since they allow efficient reuse of existing log data. However, there are fundamental limits to using existing log data alone, since the counterfactual estimators that are commonly used in these methods can have large bias and large variance when the logging policy is very different from the target policy being evaluated. To overcome this limitation, we explore the question of how to design data-gathering policies that most effectively augment an existing dataset of bandit feedback with additional observations for both learning and evaluation. To this effect, this paper introduces Minimum Variance Augmentation Logging (MVAL), a method for constructing logging policies that minimize the variance of the downstream evaluation or learning problem. We explore multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Mobile Crowdsensing and Crowdsourcing
