# Proximal Policy Optimization with Mixed Distributed Training

**Authors:** Zhenyu Zhang, Xiangfeng Luo, Tong Liu, Shaorong Xie, Jianshu Wang, Wei, Wang, Yang Li, Yan Peng

arXiv: 1907.06479 · 2019-10-01

## TL;DR

This paper introduces MDPPO, an improved distributed reinforcement learning algorithm that trains multiple policies simultaneously, leading to faster and more stable training, especially in environments with sparse rewards.

## Contribution

The paper proposes MDPPO, a novel extension of PPO that uses mixed distributed training with multiple policies and auxiliary trajectories to enhance stability and convergence speed.

## Key findings

- MDPPO accelerates training compared to standard PPO.
- MDPPO improves stability in sparse reward environments.
- Using auxiliary trajectories enhances convergence speed.

## Abstract

Instability and slowness are two main problems in deep reinforcement learning. Even if proximal policy optimization (PPO) is the state of the art, it still suffers from these two problems. We introduce an improved algorithm based on proximal policy optimization, mixed distributed proximal policy optimization (MDPPO), and show that it can accelerate and stabilize the training process. In our algorithm, multiple different policies train simultaneously and each of them controls several identical agents that interact with environments. Actions are sampled by each policy separately as usual, but the trajectories for the training process are collected from all agents, instead of only one policy. We find that if we choose some auxiliary trajectories elaborately to train policies, the algorithm will be more stable and quicker to converge especially in the environments with sparse rewards.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.06479/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1907.06479/full.md

## References

21 references — full list in the complete paper: https://tomesphere.com/paper/1907.06479/full.md

---
Source: https://tomesphere.com/paper/1907.06479