Transfer Q-learning
Elynn Chen, Sai Li, Michael I. Jordan

TL;DR
This paper introduces transfer Q-learning algorithms for time-inhomogeneous finite-horizon MDPs, enabling knowledge transfer across tasks and stages, with theoretical guarantees and empirical validation in complex RL settings.
Contribution
It develops novel transfer Q-learning algorithms with re-targeting for cross-stage and cross-task transfer, providing the first theoretical analysis of transfer learning in RL.
Findings
Faster convergence rate of Q* estimation in offline RL transfer.
Lower regret bounds in offline-to-online RL transfer.
Empirical validation on synthetic and real datasets supports theoretical results.
Abstract
Time-inhomogeneous finite-horizon Markov decision processes (MDP) are frequently employed to model decision-making in dynamic treatment regimes and other statistical reinforcement learning (RL) scenarios. These fields, especially healthcare and business, often face challenges such as high-dimensional state spaces and time-inhomogeneity of the MDP process, compounded by insufficient sample availability which complicates informed decision-making. To overcome these challenges, we investigate knowledge transfer within time-inhomogeneous finite-horizon MDP by leveraging data from both a target RL task and several related source tasks. We have developed transfer learning (TL) algorithms that are adaptable for both batch and online -learning, integrating valuable insights from offline source studies. The proposed transfer -learning algorithm contains a novel {\em re-targeting} step that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference
