Optimization of Epsilon-Greedy Exploration
Ethan Che, Hakan Ceylan, James McInerney, Nathan Kallus

TL;DR
This paper introduces a principled framework for optimizing epsilon-greedy exploration rates in recommendation systems by minimizing Bayesian regret using stochastic gradient descent and Model-Predictive Control, adapting dynamically to various practical constraints.
Contribution
It presents a novel approach to determine optimal exploration schedules in recommendation systems by directly minimizing Bayesian regret with SGD and MPC, addressing practical constraints.
Findings
Optimization methods outperform heuristics in various settings.
Batch size significantly impacts optimal exploration strategies.
Dynamic calibration improves recommendation performance.
Abstract
Modern recommendation systems rely on exploration to learn user preferences for new items, typically implementing uniform exploration policies (e.g., epsilon-greedy) due to their simplicity and compatibility with machine learning (ML) personalization models. Within these systems, a crucial consideration is the rate of exploration - what fraction of user traffic should receive random item recommendations and how this should evolve over time. While various heuristics exist for navigating the resulting exploration-exploitation tradeoff, selecting optimal exploration rates is complicated by practical constraints including batched updates, time-varying user traffic, short time horizons, and minimum exploration requirements. In this work, we propose a principled framework for determining the exploration schedule based on directly minimizing Bayesian regret through stochastic gradient descent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Advanced Bandit Algorithms Research · Explainable Artificial Intelligence (XAI)
