When Simple Exploration is Sample Efficient: Identifying Sufficient   Conditions for Random Exploration to Yield PAC RL Algorithms

Yao Liu; Emma Brunskill

arXiv:1805.09045·cs.LG·April 19, 2019·6 cites

When Simple Exploration is Sample Efficient: Identifying Sufficient Conditions for Random Exploration to Yield PAC RL Algorithms

Yao Liu, Emma Brunskill

PDF

Open Access

TL;DR

This paper investigates conditions under which simple random exploration strategies like epsilon-greedy can be sample efficient in reinforcement learning, providing theoretical bounds and empirical insights.

Contribution

It establishes problem-specific sample complexity bounds for Q-learning with random walk exploration based on structural properties of MDPs.

Findings

01

Bounded sample complexity in certain MDPs.

02

Empirical results align with theoretical polynomial bounds.

03

Insights into when simple exploration strategies are effective.

Abstract

Efficient exploration is one of the key challenges for reinforcement learning (RL) algorithms. Most traditional sample efficiency bounds require strategic exploration. Recently many deep RL algorithms with simple heuristic exploration strategies that have few formal guarantees, achieve surprising success in many domains. These results pose an important question about understanding these exploration strategies such as $e$ -greedy, as well as understanding what characterize the difficulty of exploration in MDPs. In this work we propose problem specific sample complexity bounds of $Q$ learning with random walk exploration that rely on several structural properties. We also link our theoretical results to some empirical benchmark domains, to illustrate if our bound gives polynomial sample complexity in these domains and how that is related with the empirical performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Auction Theory and Applications