Convergent Q-Learning for Infinite-Horizon General-Sum Markov Games through Behavioral Economics

Yizhou Zhang; Eric Mazumdar

arXiv:2508.08669·cs.GT·August 13, 2025

Convergent Q-Learning for Infinite-Horizon General-Sum Markov Games through Behavioral Economics

Yizhou Zhang, Eric Mazumdar

PDF

Open Access

TL;DR

This paper introduces a convergent Q-learning algorithm for infinite-horizon general-sum Markov games that incorporates risk-aversion and bounded rationality through the risk-averse quantal-response equilibrium, aligning more closely with human decision-making.

Contribution

It extends the analysis of risk-averse quantal-response equilibria to infinite-horizon Markov games and provides a convergent Q-learning algorithm under these conditions.

Findings

01

Proved uniqueness and Lipschitz continuity of RQE under monotonicity.

02

Established contraction of the risk-averse Bellman operator.

03

Developed a convergent Q-learning algorithm for infinite-horizon games.

Abstract

Risk-aversion and bounded rationality are two key characteristics of human decision-making. Risk-averse quantal-response equilibrium (RQE) is a solution concept that incorporates these features, providing a more realistic depiction of human decision making in various strategic environments compared to a Nash equilibrium. Furthermore a class of RQE has recently been shown in arXiv:2406.14156 to be universally computationally tractable in all finite-horizon Markov games, allowing for the development of multi-agent reinforcement learning algorithms with convergence guarantees. In this paper, we expand upon the study of RQE and analyze their computation in both two-player normal form games and discounted infinite-horizon Markov games. For normal form games we adopt a monotonicity-based approach allowing us to generalize previous results. We first show uniqueness and Lipschitz continuity of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Game Theory and Applications · Risk and Portfolio Optimization