Model-Based Decentralized Policy Optimization

Hao Luo; Jiechuan Jiang; and Zongqing Lu

arXiv:2302.08139·cs.LG·February 17, 2023

Model-Based Decentralized Policy Optimization

Hao Luo, Jiechuan Jiang, and Zongqing Lu

PDF

Open Access

TL;DR

This paper introduces MDPO, a model-based approach for decentralized policy optimization in multi-agent systems, enhancing stability and performance by modeling environment dynamics and latent variables.

Contribution

The paper proposes a novel model-based decentralized policy optimization method with a latent variable function, improving stability and monotonicity over existing model-free approaches.

Findings

01

MDPO achieves superior performance in cooperative multi-agent tasks.

02

Theoretically, MDPO offers more stable policy optimization.

03

Latent variable prediction reduces modeling errors.

Abstract

Decentralized policy optimization has been commonly used in cooperative multi-agent tasks. However, since all agents are updating their policies simultaneously, from the perspective of individual agents, the environment is non-stationary, resulting in it being hard to guarantee monotonic policy improvement. To help the policy improvement be stable and monotonic, we propose model-based decentralized policy optimization (MDPO), which incorporates a latent variable function to help construct the transition and reward function from an individual perspective. We theoretically analyze that the policy optimization of MDPO is more stable than model-free decentralized policy optimization. Moreover, due to non-stationarity, the latent variable function is varying and hard to be modeled. We further propose a latent variable prediction method to reduce the error of the latent variable function,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGreen IT and Sustainability · Transportation and Mobility Innovations

MethodsMirror Descent Policy Optimization