MEET: A Monte Carlo Exploration-Exploitation Trade-off for Buffer Sampling
Julius Ott, Lorenzo Servadei, Jose Arjona-Medina, Enrico Rinaldi,, Gianfranco Mauro, Daniela S\'anchez Lopera, Michael Stephan, Thomas, Stadelmayer, Avik Santra, Robert Wille

TL;DR
This paper introduces MEET, a novel buffer sampling method for reinforcement learning that uses Q-value uncertainty to balance exploration and exploitation, leading to faster and better policy learning.
Contribution
It proposes a new sampling strategy leveraging Q-value uncertainty to adapt exploration-exploitation trade-offs in experience replay buffers.
Findings
Outperforms state-of-the-art sampling strategies by 26% on average.
Demonstrates stable and improved convergence in classical control environments.
Enhances learning efficiency by focusing on significant transitions.
Abstract
Data selection is essential for any data-based optimization technique, such as Reinforcement Learning. State-of-the-art sampling strategies for the experience replay buffer improve the performance of the Reinforcement Learning agent. However, they do not incorporate uncertainty in the Q-Value estimation. Consequently, they cannot adapt the sampling strategies, including exploration and exploitation of transitions, to the complexity of the task. To address this, this paper proposes a new sampling strategy that leverages the exploration-exploitation trade-off. This is enabled by the uncertainty estimation of the Q-Value function, which guides the sampling to explore more significant transitions and, thus, learn a more efficient policy. Experiments on classical control environments demonstrate stable results across various environments. They show that the proposed method outperforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Simulation Techniques and Applications · Data Stream Mining Techniques
MethodsExperience Replay
