MOORL: A Framework for Integrating Offline-Online Reinforcement Learning

Gaurav Chaudhary; Wassim Uddin Mondal; Laxmidhar Behera

arXiv:2506.09574·cs.LG·December 23, 2025

MOORL: A Framework for Integrating Offline-Online Reinforcement Learning

Gaurav Chaudhary, Wassim Uddin Mondal, Laxmidhar Behera

PDF

Open Access

TL;DR

MOORL is a hybrid reinforcement learning framework that combines offline and online methods through a meta-policy, improving exploration, stability, and performance across diverse tasks with minimal additional complexity.

Contribution

Introduces MOORL, a novel hybrid offline-online RL framework utilizing a meta-policy for seamless adaptation, enhancing exploration and stability without added computational overhead.

Findings

01

Outperforms state-of-the-art offline and hybrid RL methods on benchmark tasks.

02

Demonstrates stable Q-function learning without extra complexity.

03

Achieves consistent improvements with minimal computational overhead.

Abstract

Sample efficiency and exploration remain critical challenges in Deep Reinforcement Learning (DRL), particularly in complex domains. Offline RL, which enables agents to learn optimal policies from static, pre-collected datasets, has emerged as a promising alternative. However, offline RL is constrained by issues such as out-of-distribution (OOD) actions that limit policy performance and generalization. To overcome these limitations, we propose Meta Offline-Online Reinforcement Learning (MOORL), a hybrid framework that unifies offline and online RL for efficient and scalable learning. While previous hybrid methods rely on extensive design components and added computational complexity to utilize offline data effectively, MOORL introduces a meta-policy that seamlessly adapts across offline and online trajectories. This enables the agent to leverage offline data for robust initialization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Advanced Bandit Algorithms Research