Simple Strategies in Multi-Objective MDPs (Technical Report)
Florent Delgrange, Joost-Pieter Katoen, Tim Quatmann, Mickael Randour

TL;DR
This paper investigates simple, implementable strategies for multi-objective Markov decision processes, providing complexity results and MILP-based solutions for verifying achievable reward trade-offs.
Contribution
It introduces a complexity analysis for pure strategies in multi-objective MDPs and offers MILP encodings and reductions for practical verification.
Findings
Checking pure stationary strategies is NP-complete for two objectives.
MILP encoding effectively solves the verification problem.
Experimental results demonstrate the feasibility of the proposed algorithms.
Abstract
We consider the verification of multiple expected reward objectives at once on Markov decision processes (MDPs). This enables a trade-off analysis among multiple objectives by obtaining the Pareto front. We focus on strategies that are easy to employ and implement. That is, strategies that are pure (no randomization) and have bounded memory. We show that checking whether a point is achievable by a pure stationary strategy is NP-complete, even for two objectives, and we provide an MILP encoding to solve the corresponding problem. The bounded memory case can be reduced to the stationary one by a product construction. Experimental results using \Storm and Gurobi show the feasibility of our algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFormal Methods in Verification · Advanced Software Engineering Methodologies · Reinforcement Learning in Robotics
