MEET: A Monte Carlo Exploration-Exploitation Trade-off for Buffer   Sampling

Julius Ott; Lorenzo Servadei; Jose Arjona-Medina; Enrico Rinaldi,; Gianfranco Mauro; Daniela S\'anchez Lopera; Michael Stephan; Thomas; Stadelmayer; Avik Santra; Robert Wille

arXiv:2210.13545·cs.LG·November 28, 2023

MEET: A Monte Carlo Exploration-Exploitation Trade-off for Buffer Sampling

Julius Ott, Lorenzo Servadei, Jose Arjona-Medina, Enrico Rinaldi,, Gianfranco Mauro, Daniela S\'anchez Lopera, Michael Stephan, Thomas, Stadelmayer, Avik Santra, Robert Wille

PDF

Open Access 1 Repo

TL;DR

This paper introduces MEET, a novel buffer sampling method for reinforcement learning that uses Q-value uncertainty to balance exploration and exploitation, leading to faster and better policy learning.

Contribution

It proposes a new sampling strategy leveraging Q-value uncertainty to adapt exploration-exploitation trade-offs in experience replay buffers.

Findings

01

Outperforms state-of-the-art sampling strategies by 26% on average.

02

Demonstrates stable and improved convergence in classical control environments.

03

Enhances learning efficiency by focusing on significant transitions.

Abstract

Data selection is essential for any data-based optimization technique, such as Reinforcement Learning. State-of-the-art sampling strategies for the experience replay buffer improve the performance of the Reinforcement Learning agent. However, they do not incorporate uncertainty in the Q-Value estimation. Consequently, they cannot adapt the sampling strategies, including exploration and exploitation of transitions, to the complexity of the task. To address this, this paper proposes a new sampling strategy that leverages the exploration-exploitation trade-off. This is enabled by the uncertainty estimation of the Q-Value function, which guides the sampling to explore more significant transitions and, thus, learn a more efficient policy. Experiments on classical control environments demonstrate stable results across various environments. They show that the proposed method outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

juliusott/uncertainty-buffer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Simulation Techniques and Applications · Data Stream Mining Techniques

MethodsExperience Replay