Restricted Value Iteration: Theory and Algorithms
N. L. Zhang, W. Zhang

TL;DR
This paper introduces a restricted value iteration approach for POMDPs that focuses on belief subsets, improving efficiency while maintaining near-optimal policies, and applies it to informative and near-discernible POMDPs.
Contribution
It proposes a novel restricted value iteration method using belief subsets, with conditions for efficiency gains and applications to specific POMDP classes.
Findings
Restricted value iteration can produce near-optimal policies with belief subsets.
Properly chosen belief subsets can reduce computational space and time.
Application to informative and near-discernible POMDPs demonstrates practical benefits.
Abstract
Value iteration is a popular algorithm for finding near optimal policies for POMDPs. It is inefficient due to the need to account for the entire belief space, which necessitates the solution of large numbers of linear programs. In this paper, we study value iteration restricted to belief subsets. We show that, together with properly chosen belief subsets, restricted value iteration yields near-optimal policies and we give a condition for determining whether a given belief subset would bring about savings in space and time. We also apply restricted value iteration to two interesting classes of POMDPs, namely informative POMDPs and near-discernible POMDPs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
