Multi-agent Off-policy Actor-Critic Reinforcement Learning for Partially Observable Environments
Ainur Zhaikhan, Ali H. Sayed

TL;DR
This paper introduces a decentralized multi-agent reinforcement learning algorithm that estimates global states via social learning, enabling effective off-policy actor-critic training in partially observable environments without requiring a transition model.
Contribution
It presents a novel social learning-based method for global state estimation in multi-agent off-policy RL, suitable for fully decentralized, model-free settings in partially observable environments.
Findings
Algorithm outperforms existing state-of-the-art methods.
Estimates of global state are $ ext{ extepsilon}$-bounded with sufficient social learning iterations.
Effective in partially observable, decentralized multi-agent environments.
Abstract
This study proposes the use of a social learning method to estimate a global state within a multi-agent off-policy actor-critic algorithm for reinforcement learning (RL) operating in a partially observable environment. We assume that the network of agents operates in a fully-decentralized manner, possessing the capability to exchange variables with their immediate neighbors. The proposed design methodology is supported by an analysis demonstrating that the difference between final outcomes, obtained when the global state is fully observed versus estimated through the social learning method, is -bounded when an appropriate number of iterations of social learning updates are implemented. Unlike many existing dec-POMDP-based RL approaches, the proposed algorithm is suitable for model-free multi-agent reinforcement learning as it does not require knowledge of a transition…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
