Proximal Policy Optimization with Adaptive Exploration

Andrei Lixandru

arXiv:2405.04664·cs.LG·May 9, 2024·1 cites

Proximal Policy Optimization with Adaptive Exploration

Andrei Lixandru

PDF

Open Access 1 Repo

TL;DR

axPPO introduces an adaptive exploration mechanism that dynamically adjusts exploration during training, leading to improved learning efficiency in reinforcement learning tasks.

Contribution

This paper presents a novel adaptive exploration framework integrated with PPO, enhancing exploration efficiency and performance in reinforcement learning.

Findings

01

axPPO outperforms standard PPO in learning efficiency

02

Adaptive exploration improves early-stage exploration behavior

03

Method demonstrates robustness across different environments

Abstract

Proximal Policy Optimization with Adaptive Exploration (axPPO) is introduced as a novel learning algorithm. This paper investigates the exploration-exploitation tradeoff within the context of reinforcement learning and aims to contribute new insights into reinforcement learning algorithm design. The proposed adaptive exploration framework dynamically adjusts the exploration magnitude during training based on the recent performance of the agent. Our proposed method outperforms standard PPO algorithms in learning efficiency, particularly when significant exploratory behavior is needed at the beginning of the learning process.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

andreilix/axppo
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Risk and Portfolio Optimization · Reservoir Engineering and Simulation Methods

MethodsEntropy Regularization · Proximal Policy Optimization