Reward Prediction Error Prioritisation in Experience Replay: The RPE-PER Method
Hoda Yamani, Yuning Xing, Lee Violet C. Ong, Bruce A. MacDonald, Henry, Williams

TL;DR
This paper introduces RPE-PER, a novel experience replay prioritisation method in reinforcement learning that uses reward prediction errors to select valuable experiences, improving learning speed and performance.
Contribution
The paper proposes RPE-PER, a new experience replay prioritisation technique based on reward prediction errors, inspired by biological learning mechanisms, and demonstrates its effectiveness in continuous control tasks.
Findings
RPE-PER accelerates learning in continuous control tasks.
RPE-PER improves the performance of off-policy actor-critic algorithms.
The method outperforms baseline experience replay strategies.
Abstract
Reinforcement Learning algorithms aim to learn optimal control strategies through iterative interactions with an environment. A critical element in this process is the experience replay buffer, which stores past experiences, allowing the algorithm to learn from a diverse range of interactions rather than just the most recent ones. This buffer is especially essential in dynamic environments with limited experiences. However, efficiently selecting high-value experiences to accelerate training remains a challenge. Drawing inspiration from the role of reward prediction errors (RPEs) in biological systems, where they are essential for adaptive behaviour and learning, we introduce Reward Predictive Error Prioritised Experience Replay (RPE-PER). This novel approach prioritises experiences in the buffer based on RPEs. Our method employs a critic network, EMCN, that predicts rewards in addition…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsForecasting Techniques and Applications · Mental Health Research Topics · Image and Video Quality Assessment
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Experience Replay
