Proximal Policy Optimization with Adaptive Exploration
Andrei Lixandru

TL;DR
axPPO introduces an adaptive exploration mechanism that dynamically adjusts exploration during training, leading to improved learning efficiency in reinforcement learning tasks.
Contribution
This paper presents a novel adaptive exploration framework integrated with PPO, enhancing exploration efficiency and performance in reinforcement learning.
Findings
axPPO outperforms standard PPO in learning efficiency
Adaptive exploration improves early-stage exploration behavior
Method demonstrates robustness across different environments
Abstract
Proximal Policy Optimization with Adaptive Exploration (axPPO) is introduced as a novel learning algorithm. This paper investigates the exploration-exploitation tradeoff within the context of reinforcement learning and aims to contribute new insights into reinforcement learning algorithm design. The proposed adaptive exploration framework dynamically adjusts the exploration magnitude during training based on the recent performance of the agent. Our proposed method outperforms standard PPO algorithms in learning efficiency, particularly when significant exploratory behavior is needed at the beginning of the learning process.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Risk and Portfolio Optimization · Reservoir Engineering and Simulation Methods
MethodsEntropy Regularization · Proximal Policy Optimization
