C-IDS: Solving Contextual POMDP via Information-Directed Objective

Chongyang Shi; Michael Dorothy; Jie Fu

arXiv:2602.03939·eess.SY·February 5, 2026

C-IDS: Solving Contextual POMDP via Information-Directed Objective

Chongyang Shi, Michael Dorothy, Jie Fu

PDF

Open Access

TL;DR

This paper introduces C-IDS, an algorithm for policy synthesis in contextual POMDPs that balances reward maximization with active uncertainty reduction about the environment's latent context, leading to improved performance.

Contribution

The paper proposes an information-directed objective and the C-IDS algorithm for contextual POMDPs, providing theoretical analysis and empirical validation of its effectiveness.

Findings

01

C-IDS outperforms standard POMDP solvers in continuous environments.

02

It achieves faster context identification and higher returns.

03

Theoretical regret bounds are established for the approach.

Abstract

We study the policy synthesis problem in contextual partially observable Markov decision processes (CPOMDPs), where the environment is governed by an unknown latent context that induces distinct POMDP dynamics. Our goal is to design a policy that simultaneously maximizes cumulative return and actively reduces uncertainty about the underlying context. We introduce an information-directed objective that augments reward maximization with mutual information between the latent context and the agent's observations. We develop the C-IDS algorithm to synthesize policies that maximize the information-directed objective. We show that the objective can be interpreted as a Lagrangian relaxation of the linear information ratio and prove that the temperature parameter is an upper bound on the information ratio. Based on this characterization, we establish a sublinear Bayesian regret bound over K…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference