Feature Reinforcement Learning: Part I: Unstructured MDPs

Marcus Hutter

arXiv:0906.1713·cs.LG·December 30, 2009

Feature Reinforcement Learning: Part I: Unstructured MDPs

Marcus Hutter

PDF

Open Access

TL;DR

This paper introduces a formal criterion and a unified algorithm for automating the extraction of MDP representations from complex, non-Markovian observations to enhance reinforcement learning applications.

Contribution

It develops a formal objective criterion for automating state representation reduction and integrates it into a comprehensive learning algorithm, expanding RL applicability.

Findings

01

Formal criterion for state reduction in RL

02

Unified algorithm for MDP extraction

03

Extension to dynamic Bayesian networks

Abstract

General-purpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and non-Markovian. On the other hand, reinforcement learning is well-developed for small finite state Markov decision processes (MDPs). Up to now, extracting the right state representations out of bare observations, that is, reducing the general agent setup to the MDP framework, is an art that involves significant effort by designers. The primary goal of this work is to automate the reduction process and thereby significantly expand the scope of many existing reinforcement learning algorithms and the agents that employ them. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main contribution of this article is to develop such a criterion. I also integrate the various parts into one…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference