Reinforcement Learning With Reward Machines in Stochastic Games

Jueming Hu; Jean-Raphael Gaglione; Yanze Wang; Zhe Xu; Ufuk Topcu; and; Yongming Liu

arXiv:2305.17372·cs.MA·August 30, 2023·1 cites

Reinforcement Learning With Reward Machines in Stochastic Games

Jueming Hu, Jean-Raphael Gaglione, Yanze Wang, Zhe Xu, Ufuk Topcu, and, Yongming Liu

PDF

Open Access

TL;DR

This paper introduces Q-learning with reward machines for stochastic games, enabling multi-agent systems to learn Nash equilibrium strategies in complex, non-Markovian reward environments with proven convergence properties.

Contribution

It develops a novel algorithm, QRM-SG, that incorporates reward machines into multi-agent reinforcement learning for stochastic games, with convergence guarantees to Nash equilibrium.

Findings

01

QRM-SG effectively learns best-response strategies in complex stochastic games.

02

QRM-SG converges faster than baseline methods like Nash Q-learning and MADDPG.

03

The algorithm demonstrates successful convergence in three case studies.

Abstract

We investigate multi-agent reinforcement learning for stochastic games with complex tasks, where the reward functions are non-Markovian. We utilize reward machines to incorporate high-level knowledge of complex tasks. We develop an algorithm called Q-learning with reward machines for stochastic games (QRM-SG), to learn the best-response strategy at Nash equilibrium for each agent. In QRM-SG, we define the Q-function at a Nash equilibrium in augmented state space. The augmented state space integrates the state of the stochastic game and the state of reward machines. Each agent learns the Q-functions of all agents in the system. We prove that Q-functions learned in QRM-SG converge to the Q-functions at a Nash equilibrium if the stage game at each time step during learning has a global optimum point or a saddle point, and the agents update Q-functions based on the best-response strategy at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

Methods*Communicated@Fast*How Do I Communicate to Expedia? · fail · Batch Normalization · Adam · Convolution · Dense Connections · Weight Decay · Q-Learning · Experience Replay · MADDPG