HSVI for zs-POSGs using Concavity, Convexity and Lipschitz Properties
Aur\'elien Delage, Olivier Buffet, Jilles Dibangoye

TL;DR
This paper introduces a novel HSVI-based solver for zero-sum partially observable stochastic games that leverages concavity, convexity, and Lipschitz properties to ensure convergence to an approximate optimal solution.
Contribution
It develops a new bounding approach and efficient operators for zs-POSGs, enabling a provably convergent solver inspired by HSVI.
Findings
The solver converges to an ε-optimal solution in finite time.
Empirical results demonstrate improved performance over heuristic methods.
The approach extends HSVI techniques to the general case of zs-POSGs.
Abstract
Dynamic programming and heuristic search are at the core of state-of-the-art solvers for sequential decision-making problems. In partially observable or collaborative settings (\eg, POMDPs and Dec-POMDPs), this requires introducing an appropriate statistic that induces a fully observable problem as well as bounding (convex) approximators of the optimal value function. This approach has succeeded in some subclasses of 2-player zero-sum partially observable stochastic games (zs-POSGs) as well, but failed in the general case despite known concavity and convexity properties, which only led to heuristic algorithms with poor convergence guarantees. We overcome this issue, leveraging on these properties to derive bounding approximators and efficient update and selection operators, before deriving a prototypical solver inspired by HSVI that provably converges to an -optimal solution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuction Theory and Applications · Decision-Making and Behavioral Economics · Experimental Behavioral Economics Studies
