Taming Equilibrium Bias in Risk-Sensitive Multi-Agent Reinforcement   Learning

Yingjie Fei; Ruitu Xu

arXiv:2405.02724·cs.LG·May 7, 2024

Taming Equilibrium Bias in Risk-Sensitive Multi-Agent Reinforcement Learning

Yingjie Fei, Ruitu Xu

PDF

Open Access

TL;DR

This paper introduces a new risk-balanced regret measure for multi-agent reinforcement learning in risk-sensitive Markov games, addressing equilibrium bias issues and providing algorithms with near-optimal guarantees.

Contribution

It proposes a novel risk-balanced regret concept and develops algorithms that effectively learn equilibria under diverse risk preferences.

Findings

01

Risk-balanced regret overcomes equilibrium bias.

02

The proposed algorithm achieves near-optimal regret guarantees.

03

Effective learning of Nash and correlated equilibria in risk-sensitive settings.

Abstract

We study risk-sensitive multi-agent reinforcement learning under general-sum Markov games, where agents optimize the entropic risk measure of rewards with possibly diverse risk preferences. We show that using the regret naively adapted from existing literature as a performance metric could induce policies with equilibrium bias that favor the most risk-sensitive agents and overlook the other agents. To address such deficiency of the naive regret, we propose a novel notion of regret, which we call risk-balanced regret, and show through a lower bound that it overcomes the issue of equilibrium bias. Furthermore, we develop a self-play algorithm for learning Nash, correlated, and coarse correlated equilibria in risk-sensitive Markov games. We prove that the proposed algorithm attains near-optimal regret guarantees with respect to the risk-balanced regret.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics