EVaDE : Event-Based Variational Thompson Sampling for Model-Based Reinforcement Learning
Siddharth Aravindan, Dixant Mittal, Wee Sun Lee

TL;DR
This paper introduces EVaDE, a novel event-based variational approach that enhances model-based reinforcement learning by improving exploration through domain-informed convolutional layers, demonstrated on Atari games.
Contribution
The paper proposes EVaDE, a new variational distribution leveraging event-based convolutional layers for better exploration in object-based domains within MBRL.
Findings
EVaDE improves exploration efficiency in Atari games.
EVaDE-SimPLe outperforms baseline methods.
Event-based layers facilitate better variational Thompson sampling.
Abstract
Posterior Sampling for Reinforcement Learning (PSRL) is a well-known algorithm that augments model-based reinforcement learning (MBRL) algorithms with Thompson sampling. PSRL maintains posterior distributions of the environment transition dynamics and the reward function, which are intractable for tasks with high-dimensional state and action spaces. Recent works show that dropout, used in conjunction with neural networks, induces variational distributions that can approximate these posteriors. In this paper, we propose Event-based Variational Distributions for Exploration (EVaDE), which are variational distributions that are useful for MBRL, especially when the underlying domain is object-based. We leverage the general domain knowledge of object-based domains to design three types of event-based convolutional layers to direct exploration. These layers rely on Gaussian dropouts and are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Reinforcement Learning in Robotics · Robotic Locomotion and Control
