Self-Confirming Transformer for Belief-Conditioned Adaptation in Offline Multi-Agent Reinforcement Learning
Tao Li, Juan Guevara, Xinhong Xie, and Quanyan Zhu

TL;DR
This paper introduces a self-confirming transformer that adapts online to nonstationary opponents in offline multi-agent reinforcement learning by predicting opponent actions and updating beliefs, inspired by game theory concepts.
Contribution
It proposes a novel auto-regressive training method for transformers to enable belief-based online adaptation in offline MARL, integrating belief consistency and best response losses.
Findings
Demonstrates belief consistency in iterated prisoner's dilemma
Shows superior performance against nonstationary opponents in multi-particle environments
Outperforms prior transformers and offline MARL baselines
Abstract
Offline reinforcement learning (RL) suffers from the distribution shift between the offline dataset and the online environment. In multi-agent RL (MARL), this distribution shift may arise from the nonstationary opponents in the online testing who display distinct behaviors from those recorded in the offline dataset. Hence, the key to the broader deployment of offline MARL is the online adaptation to nonstationary opponents. Recent advances in foundation models, e.g., large language models, have demonstrated the generalization ability of the transformer, an emerging neural network architecture, in sequence modeling, of which offline RL is a special case. One naturally wonders \textit{whether offline-trained transformer-based RL policies adapt to nonstationary opponents online}. We propose a novel auto-regressive training to equip transformer agents with online adaptability based on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Topic Modeling
