On Bellman's Optimality Principle for zs-POSGs

Olivier Buffet; Jilles Dibangoye; Aur\'elien Delage; Abdallah; Saffidine; Vincent Thomas

arXiv:2006.16395·cs.AI·November 16, 2022·1 cites

On Bellman's Optimality Principle for zs-POSGs

Olivier Buffet, Jilles Dibangoye, Aur\'elien Delage, Abdallah, Saffidine, Vincent Thomas

PDF

Open Access

TL;DR

This paper extends Bellman's optimality principle to infinite horizon 2-player zero-sum partially observable stochastic games by transforming them into occupancy Markov games and applying a Lipschitz-continuous value function approach, enabling finite-time epsilon-Nash equilibria.

Contribution

It introduces a novel approach to apply Bellman's principle to zs-POSGs via occupancy states and develops a HSVI-based algorithm with proven convergence guarantees.

Findings

01

The method computes epsilon-Nash equilibria in finite time.

02

Occupancy space Lipschitz continuity enables value iteration.

03

Transformation to occupancy Markov games simplifies analysis.

Abstract

Many non-trivial sequential decision-making problems are efficiently solved by relying on Bellman's optimality principle, i.e., exploiting the fact that sub-problems are nested recursively within the original problem. Here we show how it can apply to (infinite horizon) 2-player zero-sum partially observable stochastic games (zs-POSGs) by (i) taking a central planner's viewpoint, which can only reason on a sufficient statistic called occupancy state, and (ii) turning such problems into zero-sum occupancy Markov games (zs-OMGs). Then, exploiting the Lipschitz-continuity of the value function in occupancy space, one can derive a version of the HSVI algorithm (Heuristic Search Value Iteration) that provably finds an $ϵ$ -Nash equilibrium in finite time.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Auction Theory and Applications