Feature Reinforcement Learning: Part I: Unstructured MDPs
Marcus Hutter

TL;DR
This paper introduces a formal criterion and a unified algorithm for automating the extraction of MDP representations from complex, non-Markovian observations to enhance reinforcement learning applications.
Contribution
It develops a formal objective criterion for automating state representation reduction and integrates it into a comprehensive learning algorithm, expanding RL applicability.
Findings
Formal criterion for state reduction in RL
Unified algorithm for MDP extraction
Extension to dynamic Bayesian networks
Abstract
General-purpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and non-Markovian. On the other hand, reinforcement learning is well-developed for small finite state Markov decision processes (MDPs). Up to now, extracting the right state representations out of bare observations, that is, reducing the general agent setup to the MDP framework, is an art that involves significant effort by designers. The primary goal of this work is to automate the reduction process and thereby significantly expand the scope of many existing reinforcement learning algorithms and the agents that employ them. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main contribution of this article is to develop such a criterion. I also integrate the various parts into one…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference
