Parameterized MDPs and Reinforcement Learning Problems -- A Maximum Entropy Principle Based Framework
Amber Srivastava, Srinivasa M Salapaka

TL;DR
This paper introduces a maximum entropy-based framework for parameterized MDPs and reinforcement learning, improving exploration, robustness, and parameter estimation in noisy environments and complex non-convex problems.
Contribution
It proposes a novel maximum entropy principle framework for parameterized MDPs, enhancing exploration, robustness, and parameter estimation in complex RL problems.
Findings
Faster convergence compared to Q-learning methods.
Robust solutions in noisy data environments.
Successful application to 5G small cell network routing.
Abstract
We present a framework to address a class of sequential decision making problems. Our framework features learning the optimal control policy with robustness to noisy data, determining the unknown state and action parameters, and performing sensitivity analysis with respect to problem parameters. We consider two broad categories of sequential decision making problems modelled as infinite horizon Markov Decision Processes (MDPs) with (and without) an absorbing state. The central idea underlying our framework is to quantify exploration in terms of the Shannon Entropy of the trajectories under the MDP and determine the stochastic policy that maximizes it while guaranteeing a low value of the expected cost along a trajectory. This resulting policy enhances the quality of exploration early on in the learning process, and consequently allows faster convergence rates and robust solutions even…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDouble Q-learning · Q-Learning
