Joint action loss for proximal policy optimization

Xiulei Song; Yizhao Jin; Greg Slabaugh; Simon Lucas

arXiv:2301.10919·cs.LG·January 27, 2023·1 cites

Joint action loss for proximal policy optimization

Xiulei Song, Yizhao Jin, Greg Slabaugh, Simon Lucas

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel joint action loss for PPO that improves sample efficiency and performance in complex environments by separately considering sub-actions and combining joint and separate probabilities.

Contribution

It proposes a multi-action mixed loss that enhances PPO by better handling compound actions and reducing clipping issues, leading to significant performance gains.

Findings

01

Over 50% performance improvement in MuJoCo environments.

02

Sub-action loss outperforms standard PPO in Gym-μRTS.

03

Better balance of sample efficiency and action quality.

Abstract

PPO (Proximal Policy Optimization) is a state-of-the-art policy gradient algorithm that has been successfully applied to complex computer games such as Dota 2 and Honor of Kings. In these environments, an agent makes compound actions consisting of multiple sub-actions. PPO uses clipping to restrict policy updates. Although clipping is simple and effective, it is not efficient in its sample use. For compound actions, most PPO implementations consider the joint probability (density) of sub-actions, which means that if the ratio of a sample (state compound-action pair) exceeds the range, the gradient the sample produces is zero. Instead, for each sub-action we calculate the loss separately, which is less prone to clipping during updates thereby making better use of samples. Further, we propose a multi-action mixed loss that combines joint and separate probabilities. We perform experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ubiquition/drl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Reinforcement Learning in Robotics · Topic Modeling

MethodsEntropy Regularization · Proximal Policy Optimization · Contrastive Language-Image Pre-training