Steady-State Planning in Expected Reward Multichain MDPs

George K. Atia; Andre Beckus; Ismail Alkhouri; Alvaro Velasquez

arXiv:2012.02178·cs.AI·November 30, 2021

Steady-State Planning in Expected Reward Multichain MDPs

George K. Atia, Andre Beckus, Ismail Alkhouri, Alvaro Velasquez

PDF

TL;DR

This paper introduces a linear programming approach for steady-state planning in multichain MDPs, enabling the synthesis of policies that satisfy specific long-term behavior constraints with formal guarantees.

Contribution

It proposes a novel linear programming method for steady-state planning in multichain MDPs, providing guarantees for stationary policies under general conditions.

Findings

01

Linear programming solution for multichain MDPs

02

Guarantees on stationary policy behavior

03

Applicable to complex, disconnected systems

Abstract

The planning domain has experienced increased interest in the formal synthesis of decision-making policies. This formal synthesis typically entails finding a policy which satisfies formal specifications in the form of some well-defined logic. While many such logics have been proposed with varying degrees of expressiveness and complexity in their capacity to capture desirable agent behavior, their value is limited when deriving decision-making policies which satisfy certain types of asymptotic behavior in general system models. In particular, we are interested in specifying constraints on the steady-state behavior of an agent, which captures the proportion of time an agent spends in each state as it interacts for an indefinite period of time with its environment. This is sometimes called the average or expected behavior of the agent and the associated planning problem is faced with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.