CUER: Corrected Uniform Experience Replay for Off-Policy Continuous Deep   Reinforcement Learning Algorithms

Arda Sarp Yenicesu; Furkan B. Mutlu; Suleyman S. Kozat; Ozgur S. Oguz

arXiv:2406.09030·cs.LG·June 14, 2024

CUER: Corrected Uniform Experience Replay for Off-Policy Continuous Deep Reinforcement Learning Algorithms

Arda Sarp Yenicesu, Furkan B. Mutlu, Suleyman S. Kozat, Ozgur S. Oguz

PDF

Open Access

TL;DR

This paper introduces CUER, a novel experience replay method that improves sample efficiency and stability in off-policy continuous reinforcement learning by balancing fairness and on-policy sampling.

Contribution

CUER is a new algorithm that stochastically samples experiences considering fairness, enhancing efficiency and stability in off-policy continuous control tasks.

Findings

01

Improves sample efficiency in off-policy algorithms

02

Enhances final policy performance

03

Increases training stability

Abstract

The utilization of the experience replay mechanism enables agents to effectively leverage their experiences on several occasions. In previous studies, the sampling probability of the transitions was modified based on their relative significance. The process of reassigning sample probabilities for every transition in the replay buffer after each iteration is considered extremely inefficient. Hence, in order to enhance computing efficiency, experience replay prioritization algorithms reassess the importance of a transition as it is sampled. However, the relative importance of the transitions undergoes dynamic adjustments when the agent's policy and value function are iteratively updated. Furthermore, experience replay is a mechanism that retains the transitions generated by the agent's past policies, which could potentially diverge significantly from the agent's most recent policy. An…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSmart Grid Energy Management · Data Stream Mining Techniques · Age of Information Optimization

MethodsExperience Replay