On the Statistical Efficiency of Reward-Free Exploration in Non-Linear   RL

Jinglin Chen; Aditya Modi; Akshay Krishnamurthy; Nan Jiang; Alekh; Agarwal

arXiv:2206.10770·cs.LG·October 25, 2022·1 cites

On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL

Jinglin Chen, Aditya Modi, Akshay Krishnamurthy, Nan Jiang, Alekh, Agarwal

PDF

Open Access 1 Video

TL;DR

This paper investigates the sample efficiency and limitations of reward-free reinforcement learning with non-linear function approximation, proposing a new algorithm and analyzing its theoretical properties under various structural assumptions.

Contribution

It introduces the RFOLIVE algorithm for reward-free exploration under minimal assumptions and demonstrates that certain explorability assumptions are not necessary, while also establishing hardness results.

Findings

01

RFOLIVE achieves sample-efficient reward-free exploration under minimal assumptions.

02

Explorability assumptions are shown to be statistically unnecessary in some settings.

03

Hardness results reveal exponential complexity gaps between different structural assumptions.

Abstract

We study reward-free reinforcement learning (RL) under general non-linear function approximation, and establish sample efficiency and hardness results under various standard structural assumptions. On the positive side, we propose the RFOLIVE (Reward-Free OLIVE) algorithm for sample-efficient reward-free exploration under minimal structural assumptions, which covers the previously studied settings of linear MDPs (Jin et al., 2020b), linear completeness (Zanette et al., 2020b) and low-rank MDPs with unknown representation (Modi et al., 2021). Our analyses indicate that the explorability or reachability assumptions, previously made for the latter two settings, are not necessary statistically for reward-free exploration. On the negative side, we provide a statistical hardness result for both reward-free and reward-aware exploration under linear completeness assumptions when the underlying…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Muscle activation and electromyography studies