Approximate Exploration through State Abstraction
Adrien Ali Ta\"iga, Aaron Courville, Marc G. Bellemare

TL;DR
This paper investigates how approximation methods, especially density-based pseudo-count exploration bonuses, influence exploration efficiency in reinforcement learning, revealing theoretical insights and proposing a new bonus to improve exploration strategies.
Contribution
It provides a theoretical analysis of approximate exploration using density models, relates density models to state abstractions, and introduces a new pseudo-count bonus to address identified mismatches.
Findings
Approximation enables trade-offs between learning speed and policy quality.
Density models can be linked to state abstractions, affecting exploration.
A new pseudo-count bonus improves exploration performance.
Abstract
Although exploration in reinforcement learning is well understood from a theoretical point of view, provably correct methods remain impractical. In this paper we study the interplay between exploration and approximation, what we call approximate exploration. Our main goal is to further our theoretical understanding of pseudo-count based exploration bonuses (Bellemare et al., 2016), a practical exploration scheme based on density modelling. As a warm-up, we quantify the performance of an exploration algorithm, MBIE-EB (Strehl and Littman, 2008), when explicitly combined with state aggregation. This allows us to confirm that, as might be expected, approximation allows the agent to trade off between learning speed and quality of the learned policy. Next, we show how a given density model can be related to an abstraction and that the corresponding pseudo-count bonus can act as a substitute…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
