Population-Guided Parallel Policy Search for Reinforcement Learning
Whiyoung Jung, Giseung Park, Youngchul Sung

TL;DR
This paper introduces a population-guided parallel learning scheme for reinforcement learning that enhances policy search efficiency by sharing experience and guiding multiple learners with the best policy information, leading to improved performance.
Contribution
It proposes a novel population-guided parallel policy search method with theoretical guarantees and demonstrates its effectiveness by applying it to TD3, outperforming existing algorithms.
Findings
Outperforms state-of-the-art RL algorithms
Enables faster policy convergence
Effective in sparse reward environments
Abstract
In this paper, a new population-guided parallel learning scheme is proposed to enhance the performance of off-policy reinforcement learning (RL). In the proposed scheme, multiple identical learners with their own value-functions and policies share a common experience replay buffer, and search a good policy in collaboration with the guidance of the best policy information. The key point is that the information of the best policy is fused in a soft manner by constructing an augmented loss function for policy update to enlarge the overall search region by the multiple learners. The guidance by the previous best policy and the enlarged range enable faster and better policy search. Monotone improvement of the expected cumulative return by the proposed scheme is proved theoretically. Working algorithms are constructed by applying the proposed scheme to the twin delayed deep deterministic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Smart Grid Energy Management
MethodsExperience Replay
