Optimal strategies in Markov decision processes with finitely additive evaluations
J\'anos Flesch, Arkadi Predtetchinski, William D Sudderth, Xavier Venel

TL;DR
This paper investigates infinite-horizon Markov decision processes where strategies are evaluated using finitely additive measures, revealing conditions for the existence or absence of optimal strategies.
Contribution
It demonstrates that, contrary to previous results, optimal strategies may not exist under certain finitely additive evaluations, providing a counterexample to prior assumptions.
Findings
Optimal strategies exist under diffuse charges satisfying the time value of money.
Counterexample shows some finitely additive evaluations lead to no optimal strategies.
The existence of optimal strategies depends critically on properties of the aggregation measure.
Abstract
We study infinite-horizon Markov decision processes (MDPs) where the decision maker evaluates each of her strategies by aggregating the infinite stream of expected stage-rewards. The crucial feature of our approach is that the aggregation is performed by means of a given diffuse charge (a diffuse finitely additive probability measure) on the set of stages. The results of Neyman [2023] imply that in this setting, in every MDP with finite state and action spaces, the decision maker has a pure optimal strategy as long as the diffuse charge satisfies the time value of money principle. His result raises the question of existence of an optimal strategy without additional assumptions on the aggregation charge. We answer this question in the negative with a counterexample. With a delicately constructed aggregation charge, the MDP has no optimal strategy at all, neither pure nor randomized.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Game Theory and Voting Systems · Game Theory and Applications
