Optimizing Local Satisfaction of Long-Run Average Objectives in Markov Decision Processes
David Kla\v{s}ka, Anton\'in Ku\v{c}era, Vojt\v{e}ch K\r{u}r, V\'it, Musil, Vojt\v{e}ch \v{R}eh\'ak

TL;DR
This paper introduces an efficient algorithm to optimize long-run average objectives in Markov decision processes, addressing the challenge of local instability in steady-state policies.
Contribution
It presents a novel algorithmic approach to ensure policies in MDPs achieve stable long-term state visitation frequencies.
Findings
The proposed algorithm effectively stabilizes state visitation frequencies.
It improves the optimality of steady-state policies in MDPs.
Experimental results demonstrate enhanced long-term performance.
Abstract
Long-run average optimization problems for Markov decision processes (MDPs) require constructing policies with optimal steady-state behavior, i.e., optimal limit frequency of visits to the states. However, such policies may suffer from local instability, i.e., the frequency of states visited in a bounded time horizon along a run differs significantly from the limit frequency. In this work, we propose an efficient algorithmic solution to this problem.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Transportation and Mobility Innovations
