Best-Response Dynamics and Fictitious Play in Identical-Interest and Zero-Sum Stochastic Games
Lucas Baudin, Rida Laraki

TL;DR
This paper introduces reinforcement learning procedures combining Q-learning and fictitious play that converge to stationary Nash equilibria in identical interest stochastic games, supported by theoretical analysis and numerical experiments.
Contribution
It develops new convergence results for continuous-time and discrete-time dynamics in identical interest stochastic games, extending previous work to broader settings.
Findings
Convergence of continuous-time dynamics to stationary equilibria.
Discrete-time procedures also converge using stochastic approximation.
Numerical experiments support theoretical results.
Abstract
This paper combines ideas from Q-learning and fictitious play to define three reinforcement learning procedures which converge to the set of stationary mixed Nash equilibria in identical interest discounted stochastic games. First, we analyse three continuous-time systems that generalize the best-response dynamics defined by Leslie et al. for zero-sum discounted stochastic games. Under some assumptions depending on the system, the dynamics are shown to converge to the set of stationary equilibria in identical interest discounted stochastic games. Then, we introduce three analog discrete-time procedures in the spirit of Sayin et al. and demonstrate their convergence to the set of stationary equilibria using our results in continuous time together with stochastic approximation techniques. Some numerical experiments complement our theoretical findings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEconomic theories and models · Game Theory and Applications · Auction Theory and Applications
