Functional Stability of Discounted Markov Decision Processes Using Economic MPC Dissipativity Theory
Arash Bahari Kordabad, Sebastien Gros

TL;DR
This paper extends dissipativity theory from Economic Model Predictive Control to analyze the stability of Markov Decision Processes under discounted optimal policies, using Q-learning to compute key functionals.
Contribution
It introduces new dissipativity conditions in probability measure space for MDP stability and proposes a practical method using Q-learning to compute storage functionals.
Findings
New dissipativity conditions ensure MDP stability in discounted setting.
Finite-horizon optimal control problems generate valid storage functionals.
Q-learning effectively computes storage functionals for stability analysis.
Abstract
This paper discusses the functional stability of closed-loop Markov Chains under optimal policies resulting from a discounted optimality criterion, forming Markov Decision Processes (MDPs). We investigate the stability of MDPs in the sense of probability measures (densities) underlying the state distributions and extend the dissipativity theory of Economic Model Predictive Control in order to characterize the MDP stability. This theory requires a so-called storage function satisfying a dissipativity inequality. In the probability measures space and for the discounted setting, we introduce new dissipativity conditions ensuring the MDP stability. We then use finite-horizon optimal control problems in order to generate valid storage functionals. In practice, we propose to use Q-learning to compute the storage functionals.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Control Systems Optimization
MethodsQ-Learning
