Estimation Considerations in Contextual Bandits
Maria Dimakopoulou, Zhengyuan Zhou, Susan Athey, Guido Imbens

TL;DR
This paper introduces balanced estimation methods in contextual bandits to reduce bias and improve learning efficiency, providing theoretical regret bounds and demonstrating practical advantages on diverse datasets.
Contribution
It develops novel balanced contextual bandit algorithms with regret bounds and empirically shows their superiority over traditional methods in various settings.
Findings
Balanced bandits reduce estimation bias.
Regret bounds match state-of-the-art in linear settings.
Improved early-stage learning and lower regret.
Abstract
Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models, which can lead to difficult estimation problems along the path of learning. We study a consideration for the exploration vs. exploitation framework that does not arise in multi-armed bandits but is crucial in contextual bandits; the way exploration and exploitation is conducted in the present affects the bias and variance in the potential outcome model estimation in subsequent stages of learning. We develop parametric and non-parametric contextual bandits that integrate balancing methods from the causal inference literature in their estimation to make it less prone to problems of estimation bias. We provide the first regret bound analyses for contextual bandits with balancing in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Reinforcement Learning in Robotics
MethodsCausal inference
