Optimal and Approximate Q-value Functions for Decentralized POMDPs
Frans A. Oliehoek, Matthijs T. J. Spaan, Nikos Vlassis

TL;DR
This paper explores defining and approximating Q-value functions for decentralized POMDPs, enabling more efficient policy computation and providing bounds on optimal solutions, with practical algorithms and experimental validation.
Contribution
It introduces two forms of optimal Q-value functions for Dec-POMDPs and analyzes approximate versions, unifying previous methods and offering new algorithms for policy extraction.
Findings
All approximate Q-value functions provide an upper bound to the optimal Q*.
Proposed algorithms can extract policies from approximate Q-values.
Experimental results validate the effectiveness of the approaches on test problems.
Abstract
Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value function Q* is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q*. In this paper we study whether similar Q-value functions can be defined for decentralized POMDP models (Dec-POMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Q-value function for Dec-POMDPs: one that gives a normative description as the Q-value function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
