Feature Markov Decision Processes

Marcus Hutter

arXiv:0812.4580·cs.AI·December 30, 2009

Feature Markov Decision Processes

Marcus Hutter

PDF

Open Access

TL;DR

This paper develops a formal criterion for automatically identifying suitable Markov Decision Process representations from complex observation sequences, integrating it into a unified learning algorithm, and extending to dynamic Bayesian networks.

Contribution

It introduces a formal objective criterion for extracting MDPs from complex data and combines it into a comprehensive learning algorithm, advancing automated state representation learning.

Findings

01

Provides a formal criterion for MDP extraction

02

Develops a unified learning algorithm incorporating the criterion

03

Extends methodology to dynamic Bayesian networks

Abstract

General purpose intelligent learning agents cycle through (complex,non-MDP) sequences of observations, actions, and rewards. On the other hand, reinforcement learning is well-developed for small finite state Markov Decision Processes (MDPs). So far it is an art performed by human designers to extract the right state representation out of the bare observations, i.e. to reduce the agent setup to the MDP framework. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main contribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm. Extensions to more realistic dynamic Bayesian networks are developed in a companion article.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Bayesian Modeling and Causal Inference · Data Stream Mining Techniques