Increasing the Value of Information During Planning in Uncertain Environments
Gaurab Pokharel

TL;DR
This paper introduces a new algorithm that enhances online planning in POMDPs by incorporating entropy into the UCB1 heuristic, improving decision-making when there are delays between information gathering and use.
Contribution
It proposes a novel modification to the POMCP algorithm by adding entropy to better value information-gathering actions during planning.
Findings
The new algorithm outperforms standard POMCP in the hallway problem.
Incorporating entropy improves the recognition of valuable information-gathering actions.
Results show significant performance gains in delayed-information scenarios.
Abstract
Prior studies have demonstrated that for many real-world problems, POMDPs can be solved through online algorithms both quickly and with near optimality. However, on an important set of problems where there is a large time delay between when the agent can gather information and when it needs to use that information, these solutions fail to adequately consider the value of information. As a result, information gathering actions, even when they are critical in the optimal policy, will be ignored by existing solutions, leading to sub-optimal decisions by the agent. In this research, we develop a novel solution that rectifies this problem by introducing a new algorithm that improves upon state-of-the-art online planning by better reflecting on the value of actions that gather information. We do this by adding Entropy to the UCB1 heuristic in the POMCP algorithm. We test this solution on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Systems and Decision Making
MethodsSparse Evolutionary Training
