Entropy Maximization for Partially Observable Markov Decision Processes
Yagiz Savas, Michael Hibbard, Bo Wu, Takashi Tanaka, Ufuk Topcu

TL;DR
This paper develops methods to synthesize controllers for POMDPs that maximize trajectory unpredictability while satisfying reward constraints, using entropy bounds and finite-state controllers.
Contribution
It introduces a novel approach to maximize POMDP entropy via finite-state controllers and parameter synthesis, linking entropy to the induced parametric Markov chain.
Findings
Maximum entropy of POMDPs is bounded by that of induced pMCs.
Algorithm for local maximization of entropy with fixed memory states.
Numerical results show the trade-off between entropy, memory states, and reward.
Abstract
We study the problem of synthesizing a controller that maximizes the entropy of a partially observable Markov decision process (POMDP) subject to a constraint on the expected total reward. Such a controller minimizes the predictability of an agent's trajectories to an outside observer while guaranteeing the completion of a task expressed by a reward function. We first prove that an agent with partial observations can achieve an entropy at most as well as an agent with perfect observations. Then, focusing on finite-state controllers (FSCs) with deterministic memory transitions, we show that the maximum entropy of a POMDP is lower bounded by the maximum entropy of the parametric Markov chain (pMC) induced by such FSCs. This relationship allows us to recast the entropy maximization problem as a so-called parameter synthesis problem for the induced pMC. We then present an algorithm to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFormal Methods in Verification · Petri Nets in System Modeling
