Entropy Maximization for Markov Decision Processes Under Temporal Logic Constraints
Yagiz Savas, Melkior Ornik, Murat Cubuktepe, Mustafa O. Karabag, Ufuk, Topcu

TL;DR
This paper develops a method to synthesize policies for Markov decision processes that maximize entropy while satisfying temporal logic constraints, balancing exploration and predictability.
Contribution
It introduces conditions for the finiteness of MDP entropy and presents a convex optimization-based algorithm for entropy maximization under temporal logic constraints.
Findings
Maximum entropy of an MDP can be finite, infinite, or unbounded.
The proposed algorithm effectively synthesizes high-entropy policies.
Numerical examples demonstrate the trade-off between path restrictions and entropy.
Abstract
We study the problem of synthesizing a policy that maximizes the entropy of a Markov decision process (MDP) subject to a temporal logic constraint. Such a policy minimizes the predictability of the paths it generates, or dually, maximizes the exploration of different paths in an MDP while ensuring the satisfaction of a temporal logic specification. We first show that the maximum entropy of an MDP can be finite, infinite or unbounded. We provide necessary and sufficient conditions under which the maximum entropy of an MDP is finite, infinite or unbounded. We then present an algorithm which is based on a convex optimization problem to synthesize a policy that maximizes the entropy of an MDP. We also show that maximizing the entropy of an MDP is equivalent to maximizing the entropy of the paths that reach a certain set of states in the MDP. Finally, we extend the algorithm to an MDP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
