Actor-Dual-Critic Dynamics for Zero-sum and Identical-Interest Stochastic Games
Ahmed Said Donmez, Yuksel Arslantas, Muhammed O. Sayin

TL;DR
This paper introduces a new decentralized, payoff-based learning algorithm for stochastic games that converges to equilibria in zero-sum and identical-interest settings, with strong theoretical guarantees and empirical validation.
Contribution
It presents a novel actor-critic framework that is model-free, game-agnostic, and gradient-free, with proven convergence in complex stochastic game environments.
Findings
Converges to approximate equilibria in zero-sum and identical-interest stochastic games.
Demonstrates robustness and effectiveness through empirical experiments.
Provides the first payoff-based decentralized algorithms with theoretical guarantees for these settings.
Abstract
We propose a novel independent and payoff-based learning framework for stochastic games that is model-free, game-agnostic, and gradient-free. The learning dynamics follow a best-response-type actor-critic architecture, where agents update their strategies (actors) using feedback from two distinct critics: a fast critic that intuitively responds to observed payoffs under limited information, and a slow critic that deliberatively approximates the solution to the underlying dynamic programming problem. Crucially, the learning process relies on non-equilibrium adaptation through smoothed best responses to observed payoffs. We establish convergence to (approximate) equilibria in two-agent zero-sum and multi-agent identical-interest stochastic games over an infinite horizon. This provides one of the first payoff-based and fully decentralized learning algorithms with theoretical guarantees in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Game Theory and Applications · Reinforcement Learning in Robotics
