Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov   Decision Processes

Andrew Wagenmaker; Yifang Chen; Max Simchowitz; Simon S. Du; Kevin; Jamieson

arXiv:2201.11206·cs.LG·June 22, 2022·5 cites

Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes

Andrew Wagenmaker, Yifang Chen, Max Simchowitz, Simon S. Du, Kevin, Jamieson

PDF

Open Access

TL;DR

This paper demonstrates that reward-free reinforcement learning in linear MDPs is not harder than reward-aware RL, providing a computationally efficient algorithm with optimal sample complexity and matching lower bounds.

Contribution

It introduces the first efficient algorithm for reward-free RL in linear MDPs with optimal dimension dependence and establishes matching lower bounds for reward-aware RL.

Findings

01

Algorithm achieves near-optimal sample complexity of rac{d^2 H^5}{\u03b5^2}

02

Lower bound matches the upper bound in dimension dependence

03

Exploration procedure is versatile for linear MDPs analysis

Abstract

Reward-free reinforcement learning (RL) considers the setting where the agent does not have access to a reward function during exploration, but must propose a near-optimal policy for an arbitrary reward function revealed only after exploring. In the the tabular setting, it is well known that this is a more difficult problem than reward-aware (PAC) RL -- where the agent has access to the reward function during exploration -- with optimal sample complexities in the two settings differing by a factor of $∣ S ∣$ , the size of the state space. We show that this separation does not exist in the setting of linear MDPs. We first develop a computationally efficient algorithm for reward-free RL in a $d$ -dimensional linear MDP with sample complexity scaling as $O (d^{2} H^{5} / ϵ^{2})$ . We then show a lower bound with matching dimension-dependence of $\Omega(d^2…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Age of Information Optimization