A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation
Philip Amortila, Nan Jiang, Dhruv Madeka, Dean P. Foster

TL;DR
This paper introduces Delphi, an efficient algorithm that combines expert demonstrations with exploration to achieve sample-efficient reinforcement learning in linear value function settings, significantly reducing expert queries and sample complexity.
Contribution
The paper presents Delphi, a novel algorithm that integrates expert queries with exploration, achieving exponential improvements in sample complexity and minimal expert input compared to prior methods.
Findings
Delphi requires only (d) expert queries.
It achieves (poly(d,H,|,1/) sample complexity.
Lower bounds show polynomial exploration needs at least ((d)) oracle calls.
Abstract
The current paper studies sample-efficient Reinforcement Learning (RL) in settings where only the optimal value function is assumed to be linearly-realizable. It has recently been understood that, even under this seemingly strong assumption and access to a generative model, worst-case sample complexities can be prohibitively (i.e., exponentially) large. We investigate the setting where the learner additionally has access to interactive demonstrations from an expert policy, and we present a statistically and computationally efficient algorithm (Delphi) for blending exploration with expert queries. In particular, Delphi requires expert queries and a amount of exploratory samples to provably recover an -suboptimal policy. Compared to pure RL approaches, this corresponds to an exponential improvement in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning and Algorithms · Advanced Bandit Algorithms Research · Reinforcement Learning in Robotics
