AM-PPO: (Advantage) Alpha-Modulation with Proximal Policy Optimization

Soham Sane

arXiv:2505.15514·cs.LG·May 22, 2025

AM-PPO: (Advantage) Alpha-Modulation with Proximal Policy Optimization

Soham Sane

PDF

Open Access

TL;DR

AM-PPO introduces an adaptive advantage modulation mechanism to improve the stability and performance of PPO in reinforcement learning by dynamically scaling advantage estimates based on their statistical properties.

Contribution

The paper proposes a novel advantage modulation technique with an alpha controller for PPO, enhancing stability and learning efficiency in reinforcement learning.

Findings

01

Achieves superior reward trajectories on continuous control benchmarks.

02

Reduces the need for clipping in adaptive optimizers.

03

Demonstrates improved stability and sustained learning progression.

Abstract

Proximal Policy Optimization (PPO) is a widely used reinforcement learning algorithm that heavily relies on accurate advantage estimates for stable and efficient training. However, raw advantage signals can exhibit significant variance, noise, and scale-related issues, impeding optimal learning performance. To address this challenge, we introduce Advantage Modulation PPO (AM-PPO), a novel enhancement of PPO that adaptively modulates advantage estimates using a dynamic, non-linear scaling mechanism. This adaptive modulation employs an alpha controller that dynamically adjusts the scaling factor based on evolving statistical properties of the advantage signals, such as their norm, variance, and a predefined target saturation level. By incorporating a tanh-based gating function driven by these adaptively scaled advantages, AM-PPO reshapes the advantage signals to stabilize gradient updates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemiconductor materials and devices · Radiation Effects in Electronics

MethodsEntropy Regularization · Proximal Policy Optimization