
TL;DR
This paper develops a formal criterion for automatically identifying suitable Markov Decision Process representations from complex observation sequences, integrating it into a unified learning algorithm, and extending to dynamic Bayesian networks.
Contribution
It introduces a formal objective criterion for extracting MDPs from complex data and combines it into a comprehensive learning algorithm, advancing automated state representation learning.
Findings
Provides a formal criterion for MDP extraction
Develops a unified learning algorithm incorporating the criterion
Extends methodology to dynamic Bayesian networks
Abstract
General purpose intelligent learning agents cycle through (complex,non-MDP) sequences of observations, actions, and rewards. On the other hand, reinforcement learning is well-developed for small finite state Markov Decision Processes (MDPs). So far it is an art performed by human designers to extract the right state representation out of the bare observations, i.e. to reduce the agent setup to the MDP framework. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main contribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm. Extensions to more realistic dynamic Bayesian networks are developed in a companion article.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Bayesian Modeling and Causal Inference · Data Stream Mining Techniques
