Actor-Dual-Critic Dynamics for Zero-sum and Identical-Interest Stochastic Games

Ahmed Said Donmez; Yuksel Arslantas; Muhammed O. Sayin

arXiv:2602.00606·cs.LG·February 3, 2026

Actor-Dual-Critic Dynamics for Zero-sum and Identical-Interest Stochastic Games

Ahmed Said Donmez, Yuksel Arslantas, Muhammed O. Sayin

PDF

Open Access

TL;DR

This paper introduces a new decentralized, payoff-based learning algorithm for stochastic games that converges to equilibria in zero-sum and identical-interest settings, with strong theoretical guarantees and empirical validation.

Contribution

It presents a novel actor-critic framework that is model-free, game-agnostic, and gradient-free, with proven convergence in complex stochastic game environments.

Findings

01

Converges to approximate equilibria in zero-sum and identical-interest stochastic games.

02

Demonstrates robustness and effectiveness through empirical experiments.

03

Provides the first payoff-based decentralized algorithms with theoretical guarantees for these settings.

Abstract

We propose a novel independent and payoff-based learning framework for stochastic games that is model-free, game-agnostic, and gradient-free. The learning dynamics follow a best-response-type actor-critic architecture, where agents update their strategies (actors) using feedback from two distinct critics: a fast critic that intuitively responds to observed payoffs under limited information, and a slow critic that deliberatively approximates the solution to the underlying dynamic programming problem. Crucially, the learning process relies on non-equilibrium adaptation through smoothed best responses to observed payoffs. We establish convergence to (approximate) equilibria in two-agent zero-sum and multi-agent identical-interest stochastic games over an infinite horizon. This provides one of the first payoff-based and fully decentralized learning algorithms with theoretical guarantees in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Game Theory and Applications · Reinforcement Learning in Robotics