Accelerating the Computation of UCB and Related Indices for   Reinforcement Learning

Wesley Cowan; Michael N. Katehakis; and Daniel Pirutinsky

arXiv:1909.13158·cs.LG·October 1, 2019

Accelerating the Computation of UCB and Related Indices for Reinforcement Learning

Wesley Cowan, Michael N. Katehakis, and Daniel Pirutinsky

PDF

Open Access

TL;DR

This paper introduces efficient computational methods for UCB-related indices in reinforcement learning, significantly reducing complexity and time, and demonstrates their effectiveness through experiments comparing regret and computational savings.

Contribution

It presents novel algorithms that simplify and accelerate the calculation of UCB indices for MDPs, applicable regardless of state space size.

Findings

01

Significant reduction in computational time for index calculation.

02

Comparable or improved regret performance in experiments.

03

Effective application to large state space MDPs.

Abstract

In this paper we derive an efficient method for computing the indices associated with an asymptotically optimal upper confidence bound algorithm (MDP-UCB) of Burnetas and Katehakis (1997) that only requires solving a system of two non-linear equations with two unknowns, irrespective of the cardinality of the state space of the Markovian decision process (MDP). In addition, we develop a similar acceleration for computing the indices for the MDP-Deterministic Minimum Empirical Divergence (MDP-DMED) algorithm developed in Cowan et al. (2019), based on ideas from Honda and Takemura (2011), that involves solving a single equation of one variable. We provide experimental results demonstrating the computational time savings and regret performance of these algorithms. In these comparison we also consider the Optimistic Linear Programming (OLP) algorithm (Tewari and Bartlett, 2008) and a method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms