Optimizing Local Satisfaction of Long-Run Average Objectives in Markov   Decision Processes

David Kla\v{s}ka; Anton\'in Ku\v{c}era; Vojt\v{e}ch K\r{u}r; V\'it; Musil; Vojt\v{e}ch \v{R}eh\'ak

arXiv:2312.12325·cs.MA·December 20, 2023·1 cites

Optimizing Local Satisfaction of Long-Run Average Objectives in Markov Decision Processes

David Kla\v{s}ka, Anton\'in Ku\v{c}era, Vojt\v{e}ch K\r{u}r, V\'it, Musil, Vojt\v{e}ch \v{R}eh\'ak

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces an efficient algorithm to optimize long-run average objectives in Markov decision processes, addressing the challenge of local instability in steady-state policies.

Contribution

It presents a novel algorithmic approach to ensure policies in MDPs achieve stable long-term state visitation frequencies.

Findings

01

The proposed algorithm effectively stabilizes state visitation frequencies.

02

It improves the optimality of steady-state policies in MDPs.

03

Experimental results demonstrate enhanced long-term performance.

Abstract

Long-run average optimization problems for Markov decision processes (MDPs) require constructing policies with optimal steady-state behavior, i.e., optimal limit frequency of visits to the states. However, such policies may suffer from local instability, i.e., the frequency of states visited in a bounded time horizon along a run differs significantly from the limit frequency. In this work, we propose an efficient algorithmic solution to this problem.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://gitlab.fi.muni.cz/formela/2024-aaai-long-run-average-mdp.git
noneOfficial

Videos

Optimizing Local Satisfaction of Long-Run Average Objectives in Markov Decision Processes· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Transportation and Mobility Innovations