MOORL: A Framework for Integrating Offline-Online Reinforcement Learning
Gaurav Chaudhary, Wassim Uddin Mondal, Laxmidhar Behera

TL;DR
MOORL is a hybrid reinforcement learning framework that combines offline and online methods through a meta-policy, improving exploration, stability, and performance across diverse tasks with minimal additional complexity.
Contribution
Introduces MOORL, a novel hybrid offline-online RL framework utilizing a meta-policy for seamless adaptation, enhancing exploration and stability without added computational overhead.
Findings
Outperforms state-of-the-art offline and hybrid RL methods on benchmark tasks.
Demonstrates stable Q-function learning without extra complexity.
Achieves consistent improvements with minimal computational overhead.
Abstract
Sample efficiency and exploration remain critical challenges in Deep Reinforcement Learning (DRL), particularly in complex domains. Offline RL, which enables agents to learn optimal policies from static, pre-collected datasets, has emerged as a promising alternative. However, offline RL is constrained by issues such as out-of-distribution (OOD) actions that limit policy performance and generalization. To overcome these limitations, we propose Meta Offline-Online Reinforcement Learning (MOORL), a hybrid framework that unifies offline and online RL for efficient and scalable learning. While previous hybrid methods rely on extensive design components and added computational complexity to utilize offline data effectively, MOORL introduces a meta-policy that seamlessly adapts across offline and online trajectories. This enables the agent to leverage offline data for robust initialization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Advanced Bandit Algorithms Research
