Model-Based Decentralized Policy Optimization
Hao Luo, Jiechuan Jiang, and Zongqing Lu

TL;DR
This paper introduces MDPO, a model-based approach for decentralized policy optimization in multi-agent systems, enhancing stability and performance by modeling environment dynamics and latent variables.
Contribution
The paper proposes a novel model-based decentralized policy optimization method with a latent variable function, improving stability and monotonicity over existing model-free approaches.
Findings
MDPO achieves superior performance in cooperative multi-agent tasks.
Theoretically, MDPO offers more stable policy optimization.
Latent variable prediction reduces modeling errors.
Abstract
Decentralized policy optimization has been commonly used in cooperative multi-agent tasks. However, since all agents are updating their policies simultaneously, from the perspective of individual agents, the environment is non-stationary, resulting in it being hard to guarantee monotonic policy improvement. To help the policy improvement be stable and monotonic, we propose model-based decentralized policy optimization (MDPO), which incorporates a latent variable function to help construct the transition and reward function from an individual perspective. We theoretically analyze that the policy optimization of MDPO is more stable than model-free decentralized policy optimization. Moreover, due to non-stationarity, the latent variable function is varying and hard to be modeled. We further propose a latent variable prediction method to reduce the error of the latent variable function,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGreen IT and Sustainability · Transportation and Mobility Innovations
MethodsMirror Descent Policy Optimization
