C-IDS: Solving Contextual POMDP via Information-Directed Objective
Chongyang Shi, Michael Dorothy, Jie Fu

TL;DR
This paper introduces C-IDS, an algorithm for policy synthesis in contextual POMDPs that balances reward maximization with active uncertainty reduction about the environment's latent context, leading to improved performance.
Contribution
The paper proposes an information-directed objective and the C-IDS algorithm for contextual POMDPs, providing theoretical analysis and empirical validation of its effectiveness.
Findings
C-IDS outperforms standard POMDP solvers in continuous environments.
It achieves faster context identification and higher returns.
Theoretical regret bounds are established for the approach.
Abstract
We study the policy synthesis problem in contextual partially observable Markov decision processes (CPOMDPs), where the environment is governed by an unknown latent context that induces distinct POMDP dynamics. Our goal is to design a policy that simultaneously maximizes cumulative return and actively reduces uncertainty about the underlying context. We introduce an information-directed objective that augments reward maximization with mutual information between the latent context and the agent's observations. We develop the C-IDS algorithm to synthesize policies that maximize the information-directed objective. We show that the objective can be interpreted as a Lagrangian relaxation of the linear information ratio and prove that the temperature parameter is an upper bound on the information ratio. Based on this characterization, we establish a sublinear Bayesian regret bound over K…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference
