Reward Prediction Error Prioritisation in Experience Replay: The RPE-PER   Method

Hoda Yamani; Yuning Xing; Lee Violet C. Ong; Bruce A. MacDonald; Henry; Williams

arXiv:2501.18093·cs.LG·January 31, 2025

Reward Prediction Error Prioritisation in Experience Replay: The RPE-PER Method

Hoda Yamani, Yuning Xing, Lee Violet C. Ong, Bruce A. MacDonald, Henry, Williams

PDF

Open Access

TL;DR

This paper introduces RPE-PER, a novel experience replay prioritisation method in reinforcement learning that uses reward prediction errors to select valuable experiences, improving learning speed and performance.

Contribution

The paper proposes RPE-PER, a new experience replay prioritisation technique based on reward prediction errors, inspired by biological learning mechanisms, and demonstrates its effectiveness in continuous control tasks.

Findings

01

RPE-PER accelerates learning in continuous control tasks.

02

RPE-PER improves the performance of off-policy actor-critic algorithms.

03

The method outperforms baseline experience replay strategies.

Abstract

Reinforcement Learning algorithms aim to learn optimal control strategies through iterative interactions with an environment. A critical element in this process is the experience replay buffer, which stores past experiences, allowing the algorithm to learn from a diverse range of interactions rather than just the most recent ones. This buffer is especially essential in dynamic environments with limited experiences. However, efficiently selecting high-value experiences to accelerate training remains a challenge. Drawing inspiration from the role of reward prediction errors (RPEs) in biological systems, where they are essential for adaptive behaviour and learning, we introduce Reward Predictive Error Prioritised Experience Replay (RPE-PER). This novel approach prioritises experiences in the buffer based on RPEs. Our method employs a critic network, EMCN, that predicts rewards in addition…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsForecasting Techniques and Applications · Mental Health Research Topics · Image and Video Quality Assessment

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Experience Replay