How to Explore with Belief: State Entropy Maximization in POMDPs
Riccardo Zamboni, Duilio Cirino, Marcello Restelli, Mirco Mutti

TL;DR
This paper introduces a policy gradient method for maximizing true state entropy in partially observable environments, addressing the challenge of belief-based decision making in POMDPs.
Contribution
It develops an efficient policy gradient approach for entropy maximization over belief states, extending prior work to partially observable settings.
Findings
Proposes a memory-efficient policy gradient algorithm for belief-based entropy maximization.
Provides formal analysis of approximation gaps and the optimization landscape.
Addresses hallucination issues in belief-based policies.
Abstract
Recent works have studied *state entropy maximization* in reinforcement learning, in which the agent's objective is to learn a policy inducing high entropy over states visitation (Hazan et al., 2019). They typically assume full observability of the state of the system, so that the entropy of the observations is maximized. In practice, the agent may only get *partial* observations, e.g., a robot perceiving the state of a physical space through proximity sensors and cameras. A significant mismatch between the entropy over observations and true states of the system can arise in those settings. In this paper, we address the problem of entropy maximization over the *true states* with a decision policy conditioned on partial observations *only*. The latter is a generalization of POMDPs, which is intractable in general. We develop a memory and computationally efficient *policy gradient* method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Control Systems Optimization · Fault Detection and Control Systems · Distributed Sensor Networks and Detection Algorithms
