Uni-O4: Unifying Online and Offline Deep Reinforcement Learning with   Multi-Step On-Policy Optimization

Kun Lei; Zhengmao He; Chenhao Lu; Kaizhe Hu; Yang Gao; Huazhe Xu

arXiv:2311.03351·cs.LG·March 19, 2024·2 cites

Uni-O4: Unifying Online and Offline Deep Reinforcement Learning with Multi-Step On-Policy Optimization

Kun Lei, Zhengmao He, Chenhao Lu, Kaizhe Hu, Yang Gao, Huazhe Xu

PDF

Open Access 1 Repo

TL;DR

Uni-O4 unifies offline and online deep reinforcement learning using an on-policy approach, enabling seamless transfer, improved performance, and safe multi-step policy evaluation, demonstrated through robot tasks and benchmarks.

Contribution

The paper introduces Uni-O4, a novel on-policy framework that unifies offline and online RL without extra regularization, enhancing flexibility and performance.

Findings

01

Achieves state-of-the-art results in offline and fine-tuning tasks

02

Enables seamless offline-online transfer with ensemble policies

03

Demonstrates rapid deployment in real-world robot environments

Abstract

Combining offline and online reinforcement learning (RL) is crucial for efficient and safe learning. However, previous approaches treat offline and online learning as separate procedures, resulting in redundant designs and limited performance. We ask: Can we achieve straightforward yet effective offline and online learning without introducing extra conservatism or regularization? In this study, we propose Uni-o4, which utilizes an on-policy objective for both offline and online learning. Owning to the alignment of objectives in two phases, the RL agent can transfer between offline and online learning seamlessly. This property enhances the flexibility of the learning paradigm, allowing for arbitrary combinations of pretraining, fine-tuning, offline, and online learning. In the offline phase, specifically, Uni-o4 leverages diverse ensemble policies to address the mismatch issues between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Lei-Kun/Uni-O4
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics