Planning with Information-Processing Constraints and Model Uncertainty   in Markov Decision Processes

Jordi Grau-Moya; Felix Leibfried; Tim Genewein; Daniel A. Braun

arXiv:1604.02080·cs.AI·April 8, 2016

Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes

Jordi Grau-Moya, Felix Leibfried, Tim Genewein, Daniel A. Braun

PDF

TL;DR

This paper introduces a unified framework for MDP planning that incorporates information-processing constraints and model uncertainty, generalizing existing methods and providing a convergent value iteration scheme.

Contribution

It develops a generalized variational principle and value iteration method that unify standard, Bayesian, and robust MDP planning under a single framework.

Findings

01

The generalized value iteration converges reliably.

02

The approach encompasses standard, Bayesian, and robust planning as special cases.

03

Demonstrated benefits in a grid world simulation.

Abstract

Information-theoretic principles for learning and acting have been proposed to solve particular classes of Markov Decision Problems. Mathematically, such approaches are governed by a variational free energy principle and allow solving MDP planning problems with information-processing constraints expressed in terms of a Kullback-Leibler divergence with respect to a reference distribution. Here we consider a generalization of such MDP planners by taking model uncertainty into account. As model uncertainty can also be formalized as an information-processing constraint, we can derive a unified solution from a single generalized variational principle. We provide a generalized value iteration scheme together with a convergence proof. As limit cases, this generalized scheme includes standard value iteration with a known model, Bayesian MDP planning, and robust planning. We demonstrate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.