Efficient Inference in Markov Control Problems

Thomas Furmston; David Barber

arXiv:1202.3720·cs.SY·February 20, 2012·2 cites

Efficient Inference in Markov Control Problems

Thomas Furmston, David Barber

PDF

Open Access

TL;DR

This paper introduces a more efficient exact inference algorithm for finite horizon Markov control problems and extends it to infinite horizon cases, improving policy gradient and EM methods.

Contribution

It presents a novel, more efficient inference algorithm for finite horizon problems and extends it to infinite horizon Markov Decision Problems.

Findings

01

More efficient inference algorithm for finite horizon cases.

02

Extension of the algorithm to infinite horizon problems.

03

Enabling improved policy gradient and EM algorithms.

Abstract

Markov control algorithms that perform smooth, non-greedy updates of the policy have been shown to be very general and versatile, with policy gradient and Expectation Maximisation algorithms being particularly popular. For these algorithms, marginal inference of the reward weighted trajectory distribution is required to perform policy updates. We discuss a new exact inference algorithm for these marginals in the finite horizon case that is more efficient than the standard approach based on classical forward-backward recursions. We also provide a principled extension to infinite horizon Markov Decision Problems that explicitly accounts for an infinite horizon. This extension provides a novel algorithm for both policy gradients and Expectation Maximisation in infinite horizon problems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Optimization and Search Problems