MDPs with a State Sensing Cost

Vansh Kapoor; Jayakrishnan Nair

arXiv:2505.03280·cs.LG·April 16, 2026

MDPs with a State Sensing Cost

Vansh Kapoor, Jayakrishnan Nair

PDF

TL;DR

This paper studies decision-making in MDPs where sensing the environment incurs a cost, proposing bounds on optimality and an efficient algorithm that performs well in practice.

Contribution

It formulates a new MDP model with sensing costs, derives bounds on optimal policies, and introduces a practical algorithm for near-optimal decision-making.

Findings

01

Lower bounds on the optimal value function are established.

02

The SPI algorithm performs close to the optimal policy in practice.

03

Benchmarking shows the effectiveness of the proposed approach.

Abstract

In many practical sequential decision-making problems, tracking the state of the environment incurs a sensing/communication/computation cost. In these settings, the agent's interaction with its environment includes the additional component of deciding when to sense the state, in a manner that balances the value associated with optimal (state-specific) actions and the cost of sensing. We formulate this as an expected discounted cost Markov Decision Process (MDP), wherein the agent incurs an additional cost for sensing its next state, but has the option to take actions while remaining `blind' to the system state. We pose this problem as a classical discounted cost MDP with an expanded (countably infinite) state space. While computing the optimal policy for this MDP is intractable in general, we derive lower bounds on the optimal value function, which allow us to bound the suboptimality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.