Optimizing the Expected Mean Payoff in Energy Markov Decision Processes
Tom\'a\v{s} Br\'azdil, Anton\'in Ku\v{c}era, Petr Novotn\'y

TL;DR
This paper studies how to compute strategies in Energy Markov Decision Processes that ensure non-negative counters while maximizing the expected mean payoff, addressing a key challenge in stochastic decision-making with resource constraints.
Contribution
It introduces methods for computing safe strategies in EMDPs that optimize expected mean payoff while maintaining non-negative energy levels.
Findings
Developed algorithms for safe strategy computation in EMDPs.
Proved the existence of optimal strategies under safety constraints.
Provided complexity analysis of the optimization problem.
Abstract
Energy Markov Decision Processes (EMDPs) are finite-state Markov decision processes where each transition is assigned an integer counter update and a rational payoff. An EMDP configuration is a pair s(n), where s is a control state and n is the current counter value. The configurations are changed by performing transitions in the standard way. We consider the problem of computing a safe strategy (i.e., a strategy that keeps the counter non-negative) which maximizes the expected mean payoff.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFormal Methods in Verification · Petri Nets in System Modeling · Advanced Battery Technologies Research
