Sleeping Experts and Bandits Approach to Constrained Markov Decision   Processes

Hyeong Soo Chang

arXiv:1412.4898·math.OC·December 17, 2014·Autom.

Sleeping Experts and Bandits Approach to Constrained Markov Decision Processes

Hyeong Soo Chang

PDF

Open Access

TL;DR

This paper introduces simulation-based algorithms inspired by sleeping experts and bandits strategies to find approximately optimal policies in large constrained Markov decision processes, with convergence guarantees and computational efficiency.

Contribution

It adapts sleeping experts and bandits algorithms to constrained MDPs, providing convergence analysis and computational complexity results independent of state and action space sizes.

Findings

01

Algorithms converge to optimal policy values

02

Expected performance converges with established rates

03

Almost-sure convergence with exponential rate

Abstract

This brief paper presents simple simulation-based algorithms for obtaining an approximately optimal policy in a given finite set in large finite constrained Markov decision processes. The algorithms are adapted from playing strategies for "sleeping experts and bandits" problem and their computational complexities are independent of state and action space sizes if the given policy set is relatively small. We establish convergence of their expected performances to the value of an optimal policy and convergence rates, and also almost-sure convergence to an optimal policy with an exponential rate for the algorithm adapted within the context of sleeping experts.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms