Exploring More When It Needs in Deep Reinforcement Learning

Youtian Guo; Qi Gao

arXiv:2109.13477·cs.LG·September 29, 2021

Exploring More When It Needs in Deep Reinforcement Learning

Youtian Guo, Qi Gao

PDF

Open Access

TL;DR

This paper introduces AN2N, an exploration mechanism for deep reinforcement learning that enhances exploration based on past performance, improving efficiency and convergence in continuous control tasks.

Contribution

It proposes a novel exploration method that adaptively explores more in states with poor past performance, integrated with DDPG and SAC algorithms.

Findings

01

AN2N improves performance in continuous control tasks.

02

AN2N accelerates convergence speed.

03

AN2N enhances exploration efficiency.

Abstract

We propose a exploration mechanism of policy in Deep Reinforcement Learning, which is exploring more when agent needs, called Add Noise to Noise (AN2N). The core idea is: when the Deep Reinforcement Learning agent is in a state of poor performance in history, it needs to explore more. So we use cumulative rewards to evaluate which past states the agents have not performed well, and use cosine distance to measure whether the current state needs to be explored more. This method shows that the exploration mechanism of the agent's policy is conducive to efficient exploration. We combining the proposed exploration mechanism AN2N with Deep Deterministic Policy Gradient (DDPG), Soft Actor-Critic (SAC) algorithms, and apply it to the field of continuous control tasks, such as halfCheetah, Hopper, and Swimmer, achieving considerable improvement in performance and convergence speed.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Advanced Bandit Algorithms Research