Auto-exploration for online reinforcement learning
Caleb Ju, Guanghui Lan

TL;DR
This paper introduces auto-exploration methods for online reinforcement learning that automatically explore state and action spaces without prior parameter knowledge, achieving optimal sample complexity with simple, parameter-free algorithms.
Contribution
The paper presents novel auto-exploration algorithms for RL that are parameter-free and achieve optimal sample complexity, applicable to both tabular and linear function approximation settings.
Findings
Achieve $O( ext{ extonehalf})$ sample complexity without prior parameters
Applicable to tabular and linear function approximation
Algorithms are simple and easy to implement
Abstract
The exploration-exploitation dilemma in reinforcement learning (RL) is a fundamental challenge to efficient RL algorithms. Existing algorithms for finite state and action discounted RL problems address this by assuming sufficient exploration over both state and action spaces. However, this yields non-implementable algorithms and sub-optimal performance. To resolve these limitations, we introduce a new class of methods with auto-exploration, or methods that automatically explore both state and action spaces in a parameter-free way, i.e.,~without a priori knowledge of problem-dependent parameters. We present two variants: one for the tabular setting and one for linear function approximation. Under algorithm-independent assumptions on the existence of an exploring optimal policy, both methods attain sample complexity to solve to error. Crucially, these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Bandit Algorithms Research
