Hybrid Reinforcement Learning Breaks Sample Size Barriers in Linear MDPs
Kevin Tan, Wei Fan, Yuting Wei

TL;DR
This paper introduces efficient hybrid RL algorithms for linear MDPs that surpass existing sample complexity bounds in offline and online settings without relying on restrictive assumptions.
Contribution
It develops the first algorithms with sharp theoretical guarantees for hybrid RL in linear MDPs, improving sample complexity bounds without single-policy concentrability.
Findings
Achieves sharper error bounds in offline RL
Attains improved regret bounds in online RL
Establishes the tightest theoretical guarantees for hybrid RL in linear MDPs
Abstract
Hybrid Reinforcement Learning (RL), where an agent learns from both an offline dataset and online explorations in an unknown environment, has garnered significant recent interest. A crucial question posed by Xie et al. (2022) is whether hybrid RL can improve upon the existing lower bounds established in purely offline and purely online RL without relying on the single-policy concentrability assumption. While Li et al. (2023) provided an affirmative answer to this question in the tabular PAC RL case, the question remains unsettled for both the regret-minimizing RL case and the non-tabular case. In this work, building upon recent advancements in offline RL and reward-agnostic exploration, we develop computationally efficient algorithms for both PAC and regret-minimizing RL with linear function approximation, without single-policy concentrability. We demonstrate that these algorithms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
