Functional Stability of Discounted Markov Decision Processes Using   Economic MPC Dissipativity Theory

Arash Bahari Kordabad; Sebastien Gros

arXiv:2203.16989·eess.SY·April 1, 2022

Functional Stability of Discounted Markov Decision Processes Using Economic MPC Dissipativity Theory

Arash Bahari Kordabad, Sebastien Gros

PDF

Open Access

TL;DR

This paper extends dissipativity theory from Economic Model Predictive Control to analyze the stability of Markov Decision Processes under discounted optimal policies, using Q-learning to compute key functionals.

Contribution

It introduces new dissipativity conditions in probability measure space for MDP stability and proposes a practical method using Q-learning to compute storage functionals.

Findings

01

New dissipativity conditions ensure MDP stability in discounted setting.

02

Finite-horizon optimal control problems generate valid storage functionals.

03

Q-learning effectively computes storage functionals for stability analysis.

Abstract

This paper discusses the functional stability of closed-loop Markov Chains under optimal policies resulting from a discounted optimality criterion, forming Markov Decision Processes (MDPs). We investigate the stability of MDPs in the sense of probability measures (densities) underlying the state distributions and extend the dissipativity theory of Economic Model Predictive Control in order to characterize the MDP stability. This theory requires a so-called storage function satisfying a dissipativity inequality. In the probability measures space and for the discounted setting, we introduce new dissipativity conditions ensuring the MDP stability. We then use finite-horizon optimal control problems in order to generate valid storage functionals. In practice, we propose to use Q-learning to compute the storage functionals.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Optimization

MethodsQ-Learning