A Few Expert Queries Suffices for Sample-Efficient RL with Resets and   Linear Value Approximation

Philip Amortila; Nan Jiang; Dhruv Madeka; Dean P. Foster

arXiv:2207.08342·cs.LG·July 19, 2022

A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation

Philip Amortila, Nan Jiang, Dhruv Madeka, Dean P. Foster

PDF

Open Access 1 Video

TL;DR

This paper introduces Delphi, an efficient algorithm that combines expert demonstrations with exploration to achieve sample-efficient reinforcement learning in linear value function settings, significantly reducing expert queries and sample complexity.

Contribution

The paper presents Delphi, a novel algorithm that integrates expert queries with exploration, achieving exponential improvements in sample complexity and minimal expert input compared to prior methods.

Findings

01

Delphi requires only (d) expert queries.

02

It achieves (poly(d,H,|,1/) sample complexity.

03

Lower bounds show polynomial exploration needs at least ((d)) oracle calls.

Abstract

The current paper studies sample-efficient Reinforcement Learning (RL) in settings where only the optimal value function is assumed to be linearly-realizable. It has recently been understood that, even under this seemingly strong assumption and access to a generative model, worst-case sample complexities can be prohibitively (i.e., exponentially) large. We investigate the setting where the learner additionally has access to interactive demonstrations from an expert policy, and we present a statistically and computationally efficient algorithm (Delphi) for blending exploration with expert queries. In particular, Delphi requires $\tilde{O} (d)$ expert queries and a $poly (d, H, ∣ A ∣, 1/ ε)$ amount of exploratory samples to provably recover an $ε$ -suboptimal policy. Compared to pure RL approaches, this corresponds to an exponential improvement in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation· slideslive

Taxonomy

TopicsMachine Learning and Algorithms · Advanced Bandit Algorithms Research · Reinforcement Learning in Robotics