Guided Policy Exploration for Markov Decision Processes using an   Uncertainty-Based Value-of-Information Criterion

Isaac J. Sledge; Matthew S. Emigh; Jose C. Principe

arXiv:1802.01518·cs.AI·March 6, 2018

Guided Policy Exploration for Markov Decision Processes using an Uncertainty-Based Value-of-Information Criterion

Isaac J. Sledge, Matthew S. Emigh, Jose C. Principe

PDF

TL;DR

This paper introduces an uncertainty-based, information-theoretic method for more efficient policy exploration in reinforcement learning, improving coverage of the policy space during training.

Contribution

It proposes a novel guided stochastic search strategy using a value of information criterion combined with state-transition uncertainty to enhance exploration.

Findings

01

More effective policy space coverage during early training stages

02

Guided exploration reduces the number of episodes needed

03

Enhanced exploration leads to better policy learning outcomes

Abstract

Reinforcement learning in environments with many action-state pairs is challenging. At issue is the number of episodes needed to thoroughly search the policy space. Most conventional heuristics address this search problem in a stochastic manner. This can leave large portions of the policy space unvisited during the early training stages. In this paper, we propose an uncertainty-based, information-theoretic approach for performing guided stochastic searches that more effectively cover the policy space. Our approach is based on the value of information, a criterion that provides the optimal trade-off between expected costs and the granularity of the search process. The value of information yields a stochastic routine for choosing actions during learning that can explore the policy space in a coarse to fine manner. We augment this criterion with a state-transition uncertainty factor, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.