Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes
Andrew Wagenmaker, Yifang Chen, Max Simchowitz, Simon S. Du, Kevin, Jamieson

TL;DR
This paper demonstrates that reward-free reinforcement learning in linear MDPs is not harder than reward-aware RL, providing a computationally efficient algorithm with optimal sample complexity and matching lower bounds.
Contribution
It introduces the first efficient algorithm for reward-free RL in linear MDPs with optimal dimension dependence and establishes matching lower bounds for reward-aware RL.
Findings
Algorithm achieves near-optimal sample complexity of rac{d^2 H^5}{\u03b5^2}
Lower bound matches the upper bound in dimension dependence
Exploration procedure is versatile for linear MDPs analysis
Abstract
Reward-free reinforcement learning (RL) considers the setting where the agent does not have access to a reward function during exploration, but must propose a near-optimal policy for an arbitrary reward function revealed only after exploring. In the the tabular setting, it is well known that this is a more difficult problem than reward-aware (PAC) RL -- where the agent has access to the reward function during exploration -- with optimal sample complexities in the two settings differing by a factor of , the size of the state space. We show that this separation does not exist in the setting of linear MDPs. We first develop a computationally efficient algorithm for reward-free RL in a -dimensional linear MDP with sample complexity scaling as . We then show a lower bound with matching dimension-dependence of $\Omega(d^2…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Age of Information Optimization
