On Reward-Free Reinforcement Learning with Linear Function Approximation
Ruosong Wang, Simon S. Du, Lin F. Yang, Ruslan Salakhutdinov

TL;DR
This paper investigates reward-free reinforcement learning with linear function approximation, providing algorithms with polynomial sample complexity under certain conditions and establishing exponential lower bounds in other scenarios.
Contribution
The paper introduces a reward-free RL algorithm with polynomial sample complexity for linear MDPs and proves exponential lower bounds when only the optimal Q-function is linear.
Findings
Polynomial sample complexity for linear MDPs with transition and reward linearity.
Exponential lower bound for reward-free RL when only the optimal Q-function is linear.
Significant separations in sample complexity based on linearity assumptions.
Abstract
Reward-free reinforcement learning (RL) is a framework which is suitable for both the batch RL setting and the setting where there are many reward functions of interest. During the exploration phase, an agent collects samples without using a pre-specified reward function. After the exploration phase, a reward function is given, and the agent uses samples collected during the exploration phase to compute a near-optimal policy. Jin et al. [2020] showed that in the tabular setting, the agent only needs to collect polynomial number of samples (in terms of the number states, the number of actions, and the planning horizon) for reward-free RL. However, in practice, the number of states and actions can be large, and thus function approximation schemes are required for generalization. In this work, we give both positive and negative results for reward-free RL with linear function approximation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Advanced Bandit Algorithms Research
