AutoML for Contextual Bandits
Praneet Dutta, Joe Cheuk, Jonathan S Kim, Massimo Mascaro

TL;DR
This paper introduces an automated meta-learning pipeline for contextual bandits that improves regret efficiency and convergence with limited samples, outperforming or matching existing models without tuning or feature engineering.
Contribution
It presents a novel end-to-end automated meta-learning approach for approximating the optimal Q function in contextual bandits, enhancing performance and ease of use.
Findings
Model outperforms or matches prior models on open datasets.
Requires no tuning or feature engineering.
Converges efficiently with limited samples.
Abstract
Contextual Bandits is one of the widely popular techniques used in applications such as personalization, recommendation systems, mobile health, causal marketing etc . As a dynamic approach, it can be more efficient than standard A/B testing in minimizing regret. We propose an end to end automated meta-learning pipeline to approximate the optimal Q function for contextual bandits problems. We see that our model is able to perform much better than random exploration, being more regret efficient and able to converge with a limited number of samples, while remaining very general and easy to use due to the meta-learning approach. We used a linearly annealed e-greedy exploration policy to define the exploration vs exploitation schedule. We tested the system on a synthetic environment to characterize it fully and we evaluated it on some open source datasets to benchmark against prior work. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Data Classification · Data Stream Mining Techniques
