Attention Loss Adjusted Prioritized Experience Replay

Zhuoying Chen; Huiping Li; Rizhong Wang

arXiv:2309.06684·cs.LG·October 10, 2023·2 cites

Attention Loss Adjusted Prioritized Experience Replay

Zhuoying Chen, Huiping Li, Rizhong Wang

PDF

Open Access

TL;DR

This paper introduces ALAP, an enhanced experience replay method that uses self-attention and double-sampling to reduce estimation errors in deep reinforcement learning, improving training efficiency.

Contribution

The paper proposes ALAP, a novel prioritized experience replay algorithm combining self-attention and double-sampling to regulate importance weights and mitigate sampling bias.

Findings

01

ALAP improves training efficiency across various RL algorithms.

02

ALAP reduces estimation error caused by non-uniform sampling.

03

ALAP demonstrates versatility in different RL environments.

Abstract

Prioritized Experience Replay (PER) is a technical means of deep reinforcement learning by selecting experience samples with more knowledge quantity to improve the training rate of neural network. However, the non-uniform sampling used in PER inevitably shifts the state-action space distribution and brings the estimation error of Q-value function. In this paper, an Attention Loss Adjusted Prioritized (ALAP) Experience Replay algorithm is proposed, which integrates the improved Self-Attention network with Double-Sampling mechanism to fit the hyperparameter that can regulate the importance sampling weights to eliminate the estimation error caused by PER. In order to verify the effectiveness and generality of the algorithm, the ALAP is tested with value-function based, policy-gradient based and multi-agent reinforcement learning algorithms in OPENAI gym, and comparison studies verify the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAge of Information Optimization

MethodsExperience Replay