Strategically Robust Multi-Agent Reinforcement Learning with Linear Function Approximation
Jake Gonzales, Max Horwitz, Eric Mazumdar, Lillian J. Ratliff

TL;DR
This paper introduces RQRE-OVI, a new algorithm for computing risk-sensitive equilibria in multi-agent reinforcement learning, offering improved robustness and stability over traditional Nash equilibria, especially in large or continuous state spaces.
Contribution
The paper proposes RQRE-OVI, an optimistic value iteration algorithm for risk-sensitive equilibrium computation with linear function approximation, including finite-sample regret analysis and robustness properties.
Findings
RQRE-OVI converges with quantifiable sample complexity.
Risk sensitivity enhances robustness and regularization.
RQRE policies are Lipschitz continuous and more robust than Nash policies.
Abstract
Provably efficient and robust equilibrium computation in general-sum Markov games remains a core challenge in multi-agent reinforcement learning. Nash equilibrium is computationally intractable in general and brittle due to equilibrium multiplicity and sensitivity to approximation error. We study Risk-Sensitive Quantal Response Equilibrium (RQRE), which yields a unique, smooth solution under bounded rationality and risk sensitivity. We propose \texttt{RQRE-OVI}, an optimistic value iteration algorithm for computing RQRE with linear function approximation in large or continuous state spaces. Through finite-sample regret analysis, we establish convergence and explicitly characterize how sample complexity scales with rationality and risk-sensitivity parameters. The regret bounds reveal a quantitative tradeoff: increasing rationality tightens regret, while risk sensitivity induces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Game Theory and Applications
