Hybrid Transfer Reinforcement Learning: Provable Sample Efficiency from Shifted-Dynamics Data
Chengrui Qu, Laixi Shi, Kishan Panaganti, Pengcheng You, Adam Wierman

TL;DR
This paper introduces a hybrid transfer reinforcement learning framework that leverages shifted-dynamics offline data to improve sample efficiency, providing theoretical guarantees and demonstrating superior performance over standard online RL.
Contribution
The paper proposes HySRL, a novel transfer algorithm that uses prior knowledge of dynamics shift to achieve provably better sample complexity in target RL tasks.
Findings
HySRL outperforms baseline online RL in experiments.
Shifted-dynamics data alone does not reduce sample complexity without prior shift information.
Prior knowledge of dynamics shift enables problem-dependent sample efficiency.
Abstract
Online Reinforcement learning (RL) typically requires high-stakes online interaction data to learn a policy for a target task. This prompts interest in leveraging historical data to improve sample efficiency. The historical data may come from outdated or related source environments with different dynamics. It remains unclear how to effectively use such data in the target task to provably enhance learning and sample efficiency. To address this, we propose a hybrid transfer RL (HTRL) setting, where an agent learns in a target environment while accessing offline data from a source environment with shifted dynamics. We show that -- without information on the dynamics shift -- general shifted-dynamics data, even with subtle shifts, does not reduce sample complexity in the target environment. However, with prior information on the degree of the dynamics shift, we design HySRL, a transfer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Traffic control and management · Adaptive Dynamic Programming Control
