Frugal Actor-Critic: Sample Efficient Off-Policy Deep Reinforcement   Learning Using Unique Experiences

Nikhil Kumar Singh; Indranil Saha

arXiv:2402.05963·cs.LG·February 13, 2024·1 cites

Frugal Actor-Critic: Sample Efficient Off-Policy Deep Reinforcement Learning Using Unique Experiences

Nikhil Kumar Singh, Indranil Saha

PDF

Open Access

TL;DR

This paper introduces a novel off-policy actor-critic reinforcement learning method that enhances sample efficiency by selecting and utilizing only unique experiences, leading to faster convergence and reduced buffer size.

Contribution

It proposes a new experience selection technique based on important state variables and kernel density estimation, improving sample efficiency in deep RL algorithms.

Findings

01

Reduces replay buffer size significantly across benchmarks

02

Achieves faster convergence than baseline algorithms

03

Attains better reward accumulation in continuous control tasks

Abstract

Efficient utilization of the replay buffer plays a significant role in the off-policy actor-critic reinforcement learning (RL) algorithms used for model-free control policy synthesis for complex dynamical systems. We propose a method for achieving sample efficiency, which focuses on selecting unique samples and adding them to the replay buffer during the exploration with the goal of reducing the buffer size and maintaining the independent and identically distributed (IID) nature of the samples. Our method is based on selecting an important subset of the set of state variables from the experiences encountered during the initial phase of random exploration, partitioning the state space into a set of abstract states based on the selected important state variables, and finally selecting the experiences with unique state-reward combination by using a kernel density estimator. We formally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInnovation and Socioeconomic Development · Innovation Diffusion and Forecasting

MethodsSparse Evolutionary Training