Model-Free Active Exploration in Reinforcement Learning
Alessio Russo, Alexandre Proutiere

TL;DR
This paper introduces a novel model-free exploration strategy for reinforcement learning that leverages an information-theoretical lower bound approximation, enabling efficient policy discovery without explicit system modeling.
Contribution
It proposes a new model-free exploration method based on an approximation of the information-theoretical lower bound, applicable to both tabular and continuous MDPs.
Findings
Faster identification of efficient policies compared to existing methods
Effective in both tabular and continuous state spaces
Demonstrates superior exploration efficiency in numerical experiments
Abstract
We study the problem of exploration in Reinforcement Learning and present a novel model-free solution. We adopt an information-theoretical viewpoint and start from the instance-specific lower bound of the number of samples that have to be collected to identify a nearly-optimal policy. Deriving this lower bound along with the optimal exploration strategy entails solving an intricate optimization problem and requires a model of the system. In turn, most existing sample optimal exploration algorithms rely on estimating the model. We derive an approximation of the instance-specific lower bound that only involves quantities that can be inferred using model-free approaches. Leveraging this approximation, we devise an ensemble-based model-free exploration strategy applicable to both tabular and continuous Markov decision processes. Numerical results demonstrate that our strategy is able to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Control Systems Optimization · Simulation Techniques and Applications
