Population-Guided Parallel Policy Search for Reinforcement Learning

Whiyoung Jung; Giseung Park; Youngchul Sung

arXiv:2001.02907·cs.LG·January 10, 2020·6 cites

Population-Guided Parallel Policy Search for Reinforcement Learning

Whiyoung Jung, Giseung Park, Youngchul Sung

PDF

Open Access 1 Repo

TL;DR

This paper introduces a population-guided parallel learning scheme for reinforcement learning that enhances policy search efficiency by sharing experience and guiding multiple learners with the best policy information, leading to improved performance.

Contribution

It proposes a novel population-guided parallel policy search method with theoretical guarantees and demonstrates its effectiveness by applying it to TD3, outperforming existing algorithms.

Findings

01

Outperforms state-of-the-art RL algorithms

02

Enables faster policy convergence

03

Effective in sparse reward environments

Abstract

In this paper, a new population-guided parallel learning scheme is proposed to enhance the performance of off-policy reinforcement learning (RL). In the proposed scheme, multiple identical learners with their own value-functions and policies share a common experience replay buffer, and search a good policy in collaboration with the guidance of the best policy information. The key point is that the information of the best policy is fused in a soft manner by constructing an augmented loss function for policy update to enlarge the overall search region by the multiple learners. The guidance by the previous best policy and the enlarged range enable faster and better policy search. Monotone improvement of the expected cumulative return by the proposed scheme is proved theoretically. Working algorithms are constructed by applying the proposed scheme to the twin delayed deep deterministic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wyjung0625/p3s
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Smart Grid Energy Management

MethodsExperience Replay