Augmenting Online RL with Offline Data is All You Need: A Unified Hybrid RL Algorithm Design and Analysis

Ruiquan Huang; Donghao Li; Chengshuai Shi; Cong Shen; Jing Yang

arXiv:2505.13768·cs.LG·July 1, 2025

Augmenting Online RL with Offline Data is All You Need: A Unified Hybrid RL Algorithm Design and Analysis

Ruiquan Huang, Donghao Li, Chengshuai Shi, Cong Shen, Jing Yang

PDF

Open Access

TL;DR

This paper introduces a unified hybrid reinforcement learning algorithm that combines offline data with online interactions, outperforming pure methods and achieving state-of-the-art results in sub-optimality and regret metrics.

Contribution

The paper presents a novel unified algorithm and analysis for hybrid RL that leverages offline datasets to enhance online learning performance, with theoretical guarantees and empirical validation.

Findings

01

Outperforms pure online and offline algorithms in sub-optimality and regret.

02

Achieves state-of-the-art results under key learning metrics.

03

Validates theoretical results in linear bandits and MDPs.

Abstract

This paper investigates a hybrid learning framework for reinforcement learning (RL) in which the agent can leverage both an offline dataset and online interactions to learn the optimal policy. We present a unified algorithm and analysis and show that augmenting confidence-based online RL algorithms with the offline dataset outperforms any pure online or offline algorithm alone and achieves state-of-the-art results under two learning metrics, i.e., sub-optimality gap and online learning regret. Specifically, we show that our algorithm achieves a sub-optimality gap $\tilde{O} (1/ (N_{0} / C (π^{*} ∣ ρ) + N_{1}))$ , where $C (π^{*} ∣ ρ)$ is a new concentrability coefficient, $N_{0}$ and $N_{1}$ are the numbers of offline and online samples, respectively. For regret minimization, we show that it achieves a constant $\tilde{O} (N_{1} / (N_{0} / C (π^{-} ∣ ρ) + N_{1}))$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Metaheuristic Optimization Algorithms Research · Smart Parking Systems Research