Accelerating the Computation of UCB and Related Indices for Reinforcement Learning
Wesley Cowan, Michael N. Katehakis, and Daniel Pirutinsky

TL;DR
This paper introduces efficient computational methods for UCB-related indices in reinforcement learning, significantly reducing complexity and time, and demonstrates their effectiveness through experiments comparing regret and computational savings.
Contribution
It presents novel algorithms that simplify and accelerate the calculation of UCB indices for MDPs, applicable regardless of state space size.
Findings
Significant reduction in computational time for index calculation.
Comparable or improved regret performance in experiments.
Effective application to large state space MDPs.
Abstract
In this paper we derive an efficient method for computing the indices associated with an asymptotically optimal upper confidence bound algorithm (MDP-UCB) of Burnetas and Katehakis (1997) that only requires solving a system of two non-linear equations with two unknowns, irrespective of the cardinality of the state space of the Markovian decision process (MDP). In addition, we develop a similar acceleration for computing the indices for the MDP-Deterministic Minimum Empirical Divergence (MDP-DMED) algorithm developed in Cowan et al. (2019), based on ideas from Honda and Takemura (2011), that involves solving a single equation of one variable. We provide experimental results demonstrating the computational time savings and regret performance of these algorithms. In these comparison we also consider the Optimistic Linear Programming (OLP) algorithm (Tewari and Bartlett, 2008) and a method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms
