Self-Confirming Transformer for Belief-Conditioned Adaptation in Offline   Multi-Agent Reinforcement Learning

Tao Li; Juan Guevara; Xinhong Xie; and Quanyan Zhu

arXiv:2310.04579·cs.LG·February 25, 2025

Self-Confirming Transformer for Belief-Conditioned Adaptation in Offline Multi-Agent Reinforcement Learning

Tao Li, Juan Guevara, Xinhong Xie, and Quanyan Zhu

PDF

Open Access

TL;DR

This paper introduces a self-confirming transformer that adapts online to nonstationary opponents in offline multi-agent reinforcement learning by predicting opponent actions and updating beliefs, inspired by game theory concepts.

Contribution

It proposes a novel auto-regressive training method for transformers to enable belief-based online adaptation in offline MARL, integrating belief consistency and best response losses.

Findings

01

Demonstrates belief consistency in iterated prisoner's dilemma

02

Shows superior performance against nonstationary opponents in multi-particle environments

03

Outperforms prior transformers and offline MARL baselines

Abstract

Offline reinforcement learning (RL) suffers from the distribution shift between the offline dataset and the online environment. In multi-agent RL (MARL), this distribution shift may arise from the nonstationary opponents in the online testing who display distinct behaviors from those recorded in the offline dataset. Hence, the key to the broader deployment of offline MARL is the online adaptation to nonstationary opponents. Recent advances in foundation models, e.g., large language models, have demonstrated the generalization ability of the transformer, an emerging neural network architecture, in sequence modeling, of which offline RL is a special case. One naturally wonders \textit{whether offline-trained transformer-based RL policies adapt to nonstationary opponents online}. We propose a novel auto-regressive training to equip transformer agents with online adaptability based on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Topic Modeling