Enhancing Deep Deterministic Policy Gradients on Continuous Control Tasks with Decoupled Prioritized Experience Replay
Mehmet Efe Lorasdagi, Dogan Can Cicek, Furkan Burak Mutlu, Suleyman Serdar Kozat

TL;DR
This paper introduces Decoupled Prioritized Experience Replay (DPER), a novel method that independently samples transition batches for Actor and Critic networks, improving performance in continuous control tasks.
Contribution
The paper proposes DPER, a new experience replay mechanism that decouples training data for Actor and Critic, enhancing deep deterministic policy gradient algorithms.
Findings
DPER outperforms traditional replay strategies in MuJoCo benchmarks.
Decoupling experience replay improves training dynamics and policy quality.
DPER is compatible with various off-policy actor-critic algorithms.
Abstract
Background: Deep Deterministic Policy Gradient-based reinforcement learning algorithms utilize Actor-Critic architectures, where both networks are typically trained using identical batches of replayed transitions. However, the learning objectives and update dynamics of the Actor and Critic differ, raising concerns about whether uniform transition usage is optimal. Objectives: We aim to improve the performance of deep deterministic policy gradient algorithms by decoupling the transition batches used to train the Actor and the Critic. Our goal is to design an experience replay mechanism that provides appropriate learning signals to each component by using separate, tailored batches. Methods: We introduce Decoupled Prioritized Experience Replay (DPER), a novel approach that allows independent sampling of transition batches for the Actor and the Critic. DPER can be integrated into any…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Neural Networks and Reservoir Computing · Domain Adaptation and Few-Shot Learning
