Importance Sampling based Exploration in Q Learning
Vijay Kumar, Mort Webster

TL;DR
This paper introduces a novel importance sampling method for action selection in Q-learning that balances exploration and exploitation effectively without tuning parameters, demonstrated on a multi-stage planning problem.
Contribution
It proposes a new importance sampling based exploration strategy in Q-learning that relies solely on the value function approximation, eliminating the need for tuning parameters.
Findings
Outperforms traditional exploration methods like Epsilon Greedy in continuous action spaces.
Effectively balances exploration and exploitation without parameter tuning.
Demonstrates improved performance in a multi-stage generation expansion planning problem.
Abstract
Approximate Dynamic Programming (ADP) is a methodology to solve multi-stage stochastic optimization problems in multi-dimensional discrete or continuous spaces. ADP approximates the optimal value function by adaptively sampling both action and state space. It provides a tractable approach to very large problems, but can suffer from the exploration-exploitation dilemma. We propose a novel approach for selecting actions using importance sampling weighted by the value function approximation in continuous decision spaces to address this dilemma. An advantage of this approach is it balances exploration and exploitation without any tuning parameters when sampling actions compared to other exploration approaches such as Epsilon Greedy, instead relying only on the approximate value function. We compare the proposed algorithm with other exploration strategies in continuous action space in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Reinforcement Learning in Robotics · Data Stream Mining Techniques
