Efficient Inference in Markov Control Problems
Thomas Furmston, David Barber

TL;DR
This paper introduces a more efficient exact inference algorithm for finite horizon Markov control problems and extends it to infinite horizon cases, improving policy gradient and EM methods.
Contribution
It presents a novel, more efficient inference algorithm for finite horizon problems and extends it to infinite horizon Markov Decision Problems.
Findings
More efficient inference algorithm for finite horizon cases.
Extension of the algorithm to infinite horizon problems.
Enabling improved policy gradient and EM algorithms.
Abstract
Markov control algorithms that perform smooth, non-greedy updates of the policy have been shown to be very general and versatile, with policy gradient and Expectation Maximisation algorithms being particularly popular. For these algorithms, marginal inference of the reward weighted trajectory distribution is required to perform policy updates. We discuss a new exact inference algorithm for these marginals in the finite horizon case that is more efficient than the standard approach based on classical forward-backward recursions. We also provide a principled extension to infinite horizon Markov Decision Problems that explicitly accounts for an infinite horizon. This extension provides a novel algorithm for both policy gradients and Expectation Maximisation in infinite horizon problems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Optimization and Search Problems
