Rollout Sampling Policy Iteration for Decentralized POMDPs
Feng Wu, Shlomo Zilberstein, Xiaoping Chen

TL;DR
DecRSPI is a scalable Monte Carlo-based algorithm for multi-agent decision-making in DEC-POMDPs, capable of solving larger problems efficiently with good solution quality.
Contribution
Introduces DecRSPI, a novel decentralized rollout sampling policy iteration algorithm that improves scalability and handles large DEC-POMDPs without explicit models.
Findings
Linear time complexity over number of agents
Bounded memory usage
Effective on large, intractable problems
Abstract
We present decentralized rollout sampling policy iteration (DecRSPI) - a new algorithm for multi-agent decision problems formalized as DEC-POMDPs. DecRSPI is designed to improve scalability and tackle problems that lack an explicit model. The algorithm uses Monte- Carlo methods to generate a sample of reachable belief states. Then it computes a joint policy for each belief state based on the rollout estimations. A new policy representation allows us to represent solutions compactly. The key benefits of the algorithm are its linear time complexity over the number of agents, its bounded memory usage and good solution quality. It can solve larger problems that are intractable for existing planning algorithms. Experimental results confirm the effectiveness and scalability of the approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuction Theory and Applications · Reinforcement Learning in Robotics · Optimization and Search Problems
