Guided Policy Exploration for Markov Decision Processes using an Uncertainty-Based Value-of-Information Criterion
Isaac J. Sledge, Matthew S. Emigh, Jose C. Principe

TL;DR
This paper introduces an uncertainty-based, information-theoretic method for more efficient policy exploration in reinforcement learning, improving coverage of the policy space during training.
Contribution
It proposes a novel guided stochastic search strategy using a value of information criterion combined with state-transition uncertainty to enhance exploration.
Findings
More effective policy space coverage during early training stages
Guided exploration reduces the number of episodes needed
Enhanced exploration leads to better policy learning outcomes
Abstract
Reinforcement learning in environments with many action-state pairs is challenging. At issue is the number of episodes needed to thoroughly search the policy space. Most conventional heuristics address this search problem in a stochastic manner. This can leave large portions of the policy space unvisited during the early training stages. In this paper, we propose an uncertainty-based, information-theoretic approach for performing guided stochastic searches that more effectively cover the policy space. Our approach is based on the value of information, a criterion that provides the optimal trade-off between expected costs and the granularity of the search process. The value of information yields a stochastic routine for choosing actions during learning that can explore the policy space in a coarse to fine manner. We augment this criterion with a state-transition uncertainty factor, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
