Scalable Model-based Policy Optimization for Decentralized Networked   Systems

Yali Du; Chengdong Ma; Yuchen Liu; Runji Lin; Hao Dong; Jun Wang and; Yaodong Yang

arXiv:2207.06559·cs.LG·September 5, 2022·1 cites

Scalable Model-based Policy Optimization for Decentralized Networked Systems

Yali Du, Chengdong Ma, Yuchen Liu, Runji Lin, Hao Dong, Jun Wang and, Yaodong Yang

PDF

Open Access

TL;DR

This paper introduces DMPO, a decentralized model-based reinforcement learning framework that enhances data efficiency for multi-agent networked systems by local modeling and communication, reducing sample complexity.

Contribution

The paper proposes a novel decentralized model-based policy optimization method with theoretical guarantees and demonstrates superior data efficiency in multi-agent control benchmarks.

Findings

01

Achieves higher data efficiency than model-free methods.

02

Matches performance of true model-based approaches.

03

Effective in transportation and traffic control tasks.

Abstract

Reinforcement learning algorithms require a large amount of samples; this often limits their real-world applications on even simple tasks. Such a challenge is more outstanding in multi-agent tasks, as each step of operation is more costly requiring communications or shifting or resources. This work aims to improve data efficiency of multi-agent control by model-based learning. We consider networked systems where agents are cooperative and communicate only locally with their neighbors, and propose the decentralized model-based policy optimization framework (DMPO). In our method, each agent learns a dynamic model to predict future states and broadcast their predictions by communication, and then the policies are trained under the model rollouts. To alleviate the bias of model-generated data, we restrain the model usage for generating myopic rollouts, thus reducing the compounding error of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTraffic Prediction and Management Techniques · Traffic control and management · Vehicular Ad Hoc Networks (VANETs)