HSVI can solve zero-sum Partially Observable Stochastic Games

Aur\'elien Delage; Olivier Buffet; Jilles S. Dibangoye; Abdallah; Saffidine

arXiv:2210.14640·cs.AI·October 27, 2022

HSVI can solve zero-sum Partially Observable Stochastic Games

Aur\'elien Delage, Olivier Buffet, Jilles S. Dibangoye, Abdallah, Saffidine

PDF

Open Access

TL;DR

This paper introduces HSVI, a novel dynamic programming-based solver for zero-sum partially observable stochastic games, providing convergence guarantees and empirical validation, thus expanding the toolkit beyond linear programming and regret minimization methods.

Contribution

It defines an equivalent game, proves properties of the value function, and develops an HSVI-like solver with proven finite-time convergence for general zs-POSGs.

Findings

01

The HSVI-like solver converges to an ε-optimal solution in finite time.

02

Mathematical properties of the value function enable deriving bounds and strategies.

03

Empirical analysis demonstrates the effectiveness of the proposed method.

Abstract

State-of-the-art methods for solving 2-player zero-sum imperfect information games rely on linear programming or regret minimization, though not on dynamic programming (DP) or heuristic search (HS), while the latter are often at the core of state-of-the-art solvers for other sequential decision-making problems. In partially observable or collaborative settings (e.g., POMDPs and Dec- POMDPs), DP and HS require introducing an appropriate statistic that induces a fully observable problem as well as bounding (convex) approximators of the optimal value function. This approach has succeeded in some subclasses of 2-player zero-sum partially observable stochastic games (zs- POSGs) as well, but how to apply it in the general case still remains an open question. We answer it by (i) rigorously defining an equivalent game to work with, (ii) proving mathematical properties of the optimal value…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDecision-Making and Behavioral Economics · Game Theory and Applications · Risk and Portfolio Optimization