Transfer Q-learning

Elynn Chen; Sai Li; Michael I. Jordan

arXiv:2202.04709·cs.LG·October 21, 2025

Transfer Q-learning

Elynn Chen, Sai Li, Michael I. Jordan

PDF

Open Access

TL;DR

This paper introduces transfer Q-learning algorithms for time-inhomogeneous finite-horizon MDPs, enabling knowledge transfer across tasks and stages, with theoretical guarantees and empirical validation in complex RL settings.

Contribution

It develops novel transfer Q-learning algorithms with re-targeting for cross-stage and cross-task transfer, providing the first theoretical analysis of transfer learning in RL.

Findings

01

Faster convergence rate of Q* estimation in offline RL transfer.

02

Lower regret bounds in offline-to-online RL transfer.

03

Empirical validation on synthetic and real datasets supports theoretical results.

Abstract

Time-inhomogeneous finite-horizon Markov decision processes (MDP) are frequently employed to model decision-making in dynamic treatment regimes and other statistical reinforcement learning (RL) scenarios. These fields, especially healthcare and business, often face challenges such as high-dimensional state spaces and time-inhomogeneity of the MDP process, compounded by insufficient sample availability which complicates informed decision-making. To overcome these challenges, we investigate knowledge transfer within time-inhomogeneous finite-horizon MDP by leveraging data from both a target RL task and several related source tasks. We have developed transfer learning (TL) algorithms that are adaptable for both batch and online $Q$ -learning, integrating valuable insights from offline source studies. The proposed transfer $Q$ -learning algorithm contains a novel {\em re-targeting} step that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference