Auto-exploration for online reinforcement learning

Caleb Ju; Guanghui Lan

arXiv:2512.06244·cs.LG·December 9, 2025

Auto-exploration for online reinforcement learning

Caleb Ju, Guanghui Lan

PDF

Open Access

TL;DR

This paper introduces auto-exploration methods for online reinforcement learning that automatically explore state and action spaces without prior parameter knowledge, achieving optimal sample complexity with simple, parameter-free algorithms.

Contribution

The paper presents novel auto-exploration algorithms for RL that are parameter-free and achieve optimal sample complexity, applicable to both tabular and linear function approximation settings.

Findings

01

Achieve $O( ext{ extonehalf})$ sample complexity without prior parameters

02

Applicable to tabular and linear function approximation

03

Algorithms are simple and easy to implement

Abstract

The exploration-exploitation dilemma in reinforcement learning (RL) is a fundamental challenge to efficient RL algorithms. Existing algorithms for finite state and action discounted RL problems address this by assuming sufficient exploration over both state and action spaces. However, this yields non-implementable algorithms and sub-optimal performance. To resolve these limitations, we introduce a new class of methods with auto-exploration, or methods that automatically explore both state and action spaces in a parameter-free way, i.e.,~without a priori knowledge of problem-dependent parameters. We present two variants: one for the tabular setting and one for linear function approximation. Under algorithm-independent assumptions on the existence of an exploring optimal policy, both methods attain $O (ϵ^{- 2})$ sample complexity to solve to $ϵ$ error. Crucially, these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Bandit Algorithms Research