Ordinary Differential Equation Methods For Markov Decision Processes and Application to Kullback-Leibler Control Cost
Ana Bu\v{s}i\'c, Sean Meyn

TL;DR
This paper introduces an innovative ODE-based method to compute optimal policies for a family of MDPs, especially those involving Kullback-Leibler control costs, enabling solutions even with complex natural randomness.
Contribution
The paper develops a novel ODE approach for solving entire families of MDPs parameterized by a weighting factor, extending to models with Kullback-Leibler costs and unstructured randomness.
Findings
The ODE approach efficiently computes value functions for parameterized MDPs.
Extension of the framework to models with uncontrollable natural randomness.
Practical solution method applicable even when Perron-Frobenius theory does not hold.
Abstract
A new approach to computation of optimal policies for MDP (Markov decision process) models is introduced. The main idea is to solve not one, but an entire family of MDPs, parameterized by a weighting factor that appears in the one-step reward function. For an MDP with states, the family of value functions is the solution to an ODE, where the vector field has a simple form, based on a matrix inverse. This general methodology is applied to a family of average-cost optimal control models in which the one-step reward function is defined by Kullback-Leibler divergence. The motivation for this reward function in prior work is computation: The solution to the MDP can be expressed in terms of the Perron-Frobenius eigenvector for an associated positive matrix. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
