Probabilistic Exploration in Planning while Learning
Grigoris I. Karakoulas

TL;DR
This paper introduces a probabilistic hill-climbing exploration algorithm for reinforcement learning, specifically enhancing Q-learning by statistically balancing exploration and exploitation in large state spaces.
Contribution
It presents a novel probabilistic exploration method that improves scalability and decision quality in Q-learning for complex tasks.
Findings
The proposed method outperforms typical exploration strategies in complex control tasks.
It provides a statistically sound approach to exploration that is adaptable to large state spaces.
The algorithm ensures high-probability near-optimal plan selection.
Abstract
Sequential decision tasks with incomplete information are characterized by the exploration problem; namely the trade-off between further exploration for learning more about the environment and immediate exploitation of the accrued information for decision-making. Within artificial intelligence, there has been an increasing interest in studying planning-while-learning algorithms for these decision tasks. In this paper we focus on the exploration problem in reinforcement learning and Q-learning in particular. The existing exploration strategies for Q-learning are of a heuristic nature and they exhibit limited scaleability in tasks with large (or infinite) state and action spaces. Efficient experimentation is needed for resolving uncertainties when possible plans are compared (i.e. exploration). The experimentation should be sufficient for selecting with statistical significance a locally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Formal Methods in Verification
