# A Logarithmic Barrier Method For Proximal Policy Optimization

**Authors:** Cheng Zeng, Hongming Zhang

arXiv: 1812.06502 · 2018-12-18

## TL;DR

This paper introduces PPO-B, a new variant of Proximal Policy Optimization that uses an interior penalty method to improve sampling efficiency and performance in reinforcement learning tasks.

## Contribution

It proposes a novel surrogate objective with an interior penalty method, enhancing PPO's efficiency and effectiveness.

## Key findings

- PPO-B outperforms PPO in Atari and Mujoco environments.
- PPO-B achieves better sampling efficiency.
- The method maintains PPO's advantages like easy implementation and good generalization.

## Abstract

Proximal policy optimization(PPO) has been proposed as a first-order optimization method for reinforcement learning. We should notice that an exterior penalty method is used in it. Often, the minimizers of the exterior penalty functions approach feasibility only in the limits as the penalty parameter grows increasingly large. Therefore, it may result in the low level of sampling efficiency. This method, which we call proximal policy optimization with barrier method (PPO-B), keeps almost all advantageous spheres of PPO such as easy implementation and good generalization. Specifically, a new surrogate objective with interior penalty method is proposed to avoid the defect arose from exterior penalty method. Conclusions can be draw that PPO-B is able to outperform PPO in terms of sampling efficiency since PPO-B achieved clearly better performance on Atari and Mujoco environment than PPO.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.06502/full.md

## Figures

61 figures with captions in the complete paper: https://tomesphere.com/paper/1812.06502/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/1812.06502/full.md

---
Source: https://tomesphere.com/paper/1812.06502