Rollout Sampling Policy Iteration for Decentralized POMDPs

Feng Wu; Shlomo Zilberstein; Xiaoping Chen

arXiv:1203.3528·cs.AI·March 19, 2012·21 cites

Rollout Sampling Policy Iteration for Decentralized POMDPs

Feng Wu, Shlomo Zilberstein, Xiaoping Chen

PDF

Open Access

TL;DR

DecRSPI is a scalable Monte Carlo-based algorithm for multi-agent decision-making in DEC-POMDPs, capable of solving larger problems efficiently with good solution quality.

Contribution

Introduces DecRSPI, a novel decentralized rollout sampling policy iteration algorithm that improves scalability and handles large DEC-POMDPs without explicit models.

Findings

01

Linear time complexity over number of agents

02

Bounded memory usage

03

Effective on large, intractable problems

Abstract

We present decentralized rollout sampling policy iteration (DecRSPI) - a new algorithm for multi-agent decision problems formalized as DEC-POMDPs. DecRSPI is designed to improve scalability and tackle problems that lack an explicit model. The algorithm uses Monte- Carlo methods to generate a sample of reachable belief states. Then it computes a joint policy for each belief state based on the rollout estimations. A new policy representation allows us to represent solutions compactly. The key benefits of the algorithm are its linear time complexity over the number of agents, its bounded memory usage and good solution quality. It can solve larger problems that are intractable for existing planning algorithms. Experimental results confirm the effectiveness and scalability of the approach.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuction Theory and Applications · Reinforcement Learning in Robotics · Optimization and Search Problems