Learning a Machine for the Decision in a Partially Observable Markov Universe
Frederic Dambreville (DGA/CTA/DT/GIP)

TL;DR
This paper introduces a method for learning optimal decision strategies in partially observable Markov environments by approximating strategic trees with parameterized hidden Markov models and optimizing them using the cross-entropy principle.
Contribution
It proposes a novel approach that directly approximates strategic decision trees with parameterized HMMs and introduces a cross-entropy based optimization method for these models.
Findings
Effective approximation of strategic trees in POMDPs
Successful application of cross-entropy optimization to HMM parameters
Improved decision-making performance in partially observable environments
Abstract
In this paper, we are interested in optimal decisions in a partially observable Markov universe. Our viewpoint departs from the dynamic programming viewpoint: we are directly approximating an optimal strategic tree depending on the observation. This approximation is made by means of a parameterized probabilistic law. In this paper, a particular family of hidden Markov models, with input and output, is considered as a learning framework. A method for optimizing the parameters of these HMMs is proposed and applied. This optimization method is based on the cross-entropic principle.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Algorithms and Data Compression · Reinforcement Learning in Robotics
