Associative Memory Based Experience Replay for Deep Reinforcement   Learning

Mengyuan Li; Arman Kazemi; Ann Franchesca Laguna; X. Sharon Hu

arXiv:2207.07791·cs.AR·March 6, 2024·1 cites

Associative Memory Based Experience Replay for Deep Reinforcement Learning

Mengyuan Li, Arman Kazemi, Ann Franchesca Laguna, X. Sharon Hu

PDF

Open Access

TL;DR

This paper introduces AMPER, a hardware-software co-designed associative memory-based experience replay system for deep reinforcement learning, significantly reducing latency while maintaining learning performance.

Contribution

It presents a novel AM-based PER with an AM-friendly priority sampling method and an in-memory computing hardware architecture, improving latency over traditional PER implementations.

Findings

01

AMPER achieves 55x to 270x latency reduction.

02

AMPER maintains comparable learning performance to traditional PER.

03

The hardware design enables efficient in-memory search operations.

Abstract

Experience replay is an essential component in deep reinforcement learning (DRL), which stores the experiences and generates experiences for the agent to learn in real time. Recently, prioritized experience replay (PER) has been proven to be powerful and widely deployed in DRL agents. However, implementing PER on traditional CPU or GPU architectures incurs significant latency overhead due to its frequent and irregular memory accesses. This paper proposes a hardware-software co-design approach to design an associative memory (AM) based PER, AMPER, with an AM-friendly priority sampling operation. AMPER replaces the widely-used time-costly tree-traversal-based priority sampling in PER while preserving the learning performance. Further, we design an in-memory computing hardware architecture based on AM to support AMPER by leveraging parallel in-memory search operations. AMPER shows…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMind wandering and attention · Mental Health Research Topics · Advanced Bandit Algorithms Research

MethodsAttention Model · Prioritized Experience Replay · Experience Replay