Active Trajectory Estimation for Partially Observed Markov Decision Processes via Conditional Entropy
Timothy L. Molloy, Girish N. Nair

TL;DR
This paper introduces a novel active smoothing approach for POMDPs that minimizes the joint trajectory entropy directly, enabling more effective state trajectory estimation over fixed horizons.
Contribution
It formulates the active smoothing problem as a belief-state MDP with a concave value function, allowing for efficient approximation and solution using POMDP techniques.
Findings
The proposed method effectively reduces trajectory uncertainty in simulations.
It outperforms approaches based on marginal state estimate minimization.
The belief-based formulation enables scalable approximate solutions.
Abstract
In this paper, we consider the problem of controlling a partially observed Markov decision process (POMDP) in order to actively estimate its state trajectory over a fixed horizon with minimal uncertainty. We pose a novel active smoothing problem in which the objective is to directly minimise the smoother entropy, that is, the conditional entropy of the (joint) state trajectory distribution of concern in fixed-interval Bayesian smoothing. Our formulation contrasts with prior active approaches that minimise the sum of conditional entropies of the (marginal) state estimates provided by Bayesian filters. By establishing a novel form of the smoother entropy in terms of the POMDP belief (or information) state, we show that our active smoothing problem can be reformulated as a (fully observed) Markov decision process with a value function that is concave in the belief state. The concavity of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
