ERPPO: Entropy Regularization-based Proximal Policy Optimization

Changha Lee; Gyusang Cho

arXiv:2605.13131·cs.LG·May 14, 2026

ERPPO: Entropy Regularization-based Proximal Policy Optimization

Changha Lee, Gyusang Cho

PDF

TL;DR

ERPPO introduces an entropy regularization technique to improve multi-agent reinforcement learning by enhancing exploration and stability in high-ambiguity environments, leading to better object detection accuracy.

Contribution

The paper proposes ERPPO, a novel method that dynamically adjusts policy regularization based on observation ambiguity to improve multi-agent RL performance.

Findings

01

ERPPO outperforms MAPPO in accuracy and gradient magnitude.

02

ERPPO reduces false detections in uncertain visual conditions.

03

Experimental results confirm improved search success in maritime scenarios.

Abstract

Multi-Agent Proximal Policy Optimization (MAPPO) is a variant of the Proximal Policy Optimization (PPO) algorithm, specifically tailored for multi-agent reinforcement learning (MARL). MAPPO optimizes cooperative multi-agent settings by employing a centralized critic with decentralized actors. However, in case of multi-dimensional environment, MAPPO can not extract optimal policy due to non-stationary agent observation. To overcome this problem, we introduce a novel approach, Entropy Regularization-based Proximal Policy Optimization (ERPPO). For the policy optimization, we first define the object detection ambiguity under multi-dimensional observation environment. Distributional Spatiotemporal Ambiguity (DSA) learner is trained to estimate object detection uncertainty in non-stationary constraints. Then, we enhance PPO with a novel Entropy Regularization term. This regularization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.