MDPs with a State Sensing Cost
Vansh Kapoor, Jayakrishnan Nair

TL;DR
This paper studies decision-making in MDPs where sensing the environment incurs a cost, proposing bounds on optimality and an efficient algorithm that performs well in practice.
Contribution
It formulates a new MDP model with sensing costs, derives bounds on optimal policies, and introduces a practical algorithm for near-optimal decision-making.
Findings
Lower bounds on the optimal value function are established.
The SPI algorithm performs close to the optimal policy in practice.
Benchmarking shows the effectiveness of the proposed approach.
Abstract
In many practical sequential decision-making problems, tracking the state of the environment incurs a sensing/communication/computation cost. In these settings, the agent's interaction with its environment includes the additional component of deciding when to sense the state, in a manner that balances the value associated with optimal (state-specific) actions and the cost of sensing. We formulate this as an expected discounted cost Markov Decision Process (MDP), wherein the agent incurs an additional cost for sensing its next state, but has the option to take actions while remaining `blind' to the system state. We pose this problem as a classical discounted cost MDP with an expanded (countably infinite) state space. While computing the optimal policy for this MDP is intractable in general, we derive lower bounds on the optimal value function, which allow us to bound the suboptimality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
