TL;DR
This paper introduces PA-POMCPOW, a novel online POMDP solver that efficiently handles large action spaces by sampling actions based on a combined score of reward and information gain, improving planning performance.
Contribution
The paper presents PA-POMCPOW, a new action sampling method that enhances POMDP tree search by balancing exploration and exploitation in large action spaces.
Findings
Outperforms existing solvers on large action space problems
Effectively balances exploration and exploitation during search
Demonstrates scalability and improved decision quality
Abstract
Online solvers for partially observable Markov decision processes have difficulty scaling to problems with large action spaces. This paper proposes a method called PA-POMCPOW to sample a subset of the action space that provides varying mixtures of exploitation and exploration for inclusion in a search tree. The proposed method first evaluates the action space according to a score function that is a linear combination of expected reward and expected information gain. The actions with the highest score are then added to the search tree during tree expansion. Experiments show that PA-POMCPOW is able to outperform existing state-of-the-art solvers on problems with large discrete action spaces.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
