Ordinary Differential Equation Methods For Markov Decision Processes and   Application to Kullback-Leibler Control Cost

Ana Bu\v{s}i\'c; Sean Meyn

arXiv:1605.04591·math.OC·September 18, 2018

Ordinary Differential Equation Methods For Markov Decision Processes and Application to Kullback-Leibler Control Cost

Ana Bu\v{s}i\'c, Sean Meyn

PDF

TL;DR

This paper introduces an innovative ODE-based method to compute optimal policies for a family of MDPs, especially those involving Kullback-Leibler control costs, enabling solutions even with complex natural randomness.

Contribution

The paper develops a novel ODE approach for solving entire families of MDPs parameterized by a weighting factor, extending to models with Kullback-Leibler costs and unstructured randomness.

Findings

01

The ODE approach efficiently computes value functions for parameterized MDPs.

02

Extension of the framework to models with uncontrollable natural randomness.

03

Practical solution method applicable even when Perron-Frobenius theory does not hold.

Abstract

A new approach to computation of optimal policies for MDP (Markov decision process) models is introduced. The main idea is to solve not one, but an entire family of MDPs, parameterized by a weighting factor $ζ$ that appears in the one-step reward function. For an MDP with $d$ states, the family of value functions ${h_{ζ}^{*} : ζ \in ℜ}$ is the solution to an ODE, $\frac{d}{d ζ} h_{ζ}^{*} = V (h_{ζ}^{*})$ where the vector field $V : ℜ^{d} \to ℜ^{d}$ has a simple form, based on a matrix inverse. This general methodology is applied to a family of average-cost optimal control models in which the one-step reward function is defined by Kullback-Leibler divergence. The motivation for this reward function in prior work is computation: The solution to the MDP can be expressed in terms of the Perron-Frobenius eigenvector for an associated positive matrix. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.