Policy Evaluation and Seeking for Multi-Agent Reinforcement Learning via   Best Response

Rui Yan; Xiaoming Duan; Zongying Shi; Yisheng Zhong; Jason; R. Marden; Francesco Bullo

arXiv:2006.09585·cs.GT·June 23, 2020

Policy Evaluation and Seeking for Multi-Agent Reinforcement Learning via Best Response

Rui Yan, Xiaoming Duan, Zongying Shi, Yisheng Zhong, Jason, R. Marden, Francesco Bullo

PDF

Open Access

TL;DR

This paper proposes new metrics based on sink equilibrium for evaluating and ranking policies in multi-agent reinforcement learning, effectively handling cyclical behaviors and opponent non-stationarity.

Contribution

It introduces cycle-based and memory-based metrics grounded in sink equilibrium, and develops perturbed strict best response dynamics for policy evaluation in multi-agent RL.

Findings

01

Metrics can distinguish optimal policies in stochastic games.

02

Perturbed SBRD converges to policies with maximum metrics.

03

Approach handles cyclical and non-stationary behaviors effectively.

Abstract

This paper introduces two metrics (cycle-based and memory-based metrics), grounded on a dynamical game-theoretic solution concept called sink equilibrium, for the evaluation, ranking, and computation of policies in multi-agent learning. We adopt strict best response dynamics (SBRD) to model selfish behaviors at a meta-level for multi-agent reinforcement learning. Our approach can deal with dynamical cyclical behaviors (unlike approaches based on Nash equilibria and Elo ratings), and is more compatible with single-agent reinforcement learning than alpha-rank which relies on weakly better responses. We first consider settings where the difference between largest and second largest underlying metric has a known lower bound. With this knowledge we propose a class of perturbed SBRD with the following property: only policies with maximum metric are observed with nonzero probability for a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGame Theory and Applications · Reinforcement Learning in Robotics · Experimental Behavioral Economics Studies