A Minimum Relative Entropy Controller for Undiscounted Markov Decision Processes
Pedro A. Ortega, Daniel A. Braun

TL;DR
This paper introduces BCR-MDP, a Bayesian control-based method for solving undiscounted Markov decision processes with unknown dynamics, effectively balancing exploration and exploitation without sub-optimal cycles.
Contribution
It develops a non-parametric prior and Gibbs sampling approach within the Bayesian control framework for undiscounted MDPs, addressing exploration-exploitation trade-offs.
Findings
BCR-MDP avoids sub-optimal limit cycles.
The method effectively balances exploration and exploitation.
Preliminary results are promising for unknown dynamics.
Abstract
Adaptive control problems are notoriously difficult to solve even in the presence of plant-specific controllers. One way to by-pass the intractable computation of the optimal policy is to restate the adaptive control as the minimization of the relative entropy of a controller that ignores the true plant dynamics from an informed controller. The solution is given by the Bayesian control rule-a set of equations characterizing a stochastic adaptive controller for the class of possible plant dynamics. Here, the Bayesian control rule is applied to derive BCR-MDP, a controller to solve undiscounted Markov decision processes with finite state and action spaces and unknown dynamics. In particular, we derive a non-parametric conjugate prior distribution over the policy space that encapsulates the agent's whole relevant history and we present a Gibbs sampler to draw random policies from this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Advanced Control Systems Optimization · Reinforcement Learning in Robotics
