The Effect of Q-function Reuse on the Total Regret of Tabular, Model-Free, Reinforcement Learning
Volodymyr Tkachuk, Sriram Ganapathi Subramanian, Matthew E. Taylor

TL;DR
This paper provides the first theoretical analysis of Q-function reuse in tabular, model-free reinforcement learning, demonstrating it can lead to regret bounds independent of state or action space size, supported by empirical evidence.
Contribution
It offers the first theoretical regret bounds for Q-function reuse in tabular, model-free RL, showing potential for significant sample complexity reduction.
Findings
Regret bound independent of state and action space size
Empirical results support theoretical insights
Q-function reuse improves learning efficiency
Abstract
Some reinforcement learning methods suffer from high sample complexity causing them to not be practical in real-world situations. -function reuse, a transfer learning method, is one way to reduce the sample complexity of learning, potentially improving usefulness of existing algorithms. Prior work has shown the empirical effectiveness of -function reuse for various environments when applied to model-free algorithms. To the best of our knowledge, there has been no theoretical work showing the regret of -function reuse when applied to the tabular, model-free setting. We aim to bridge the gap between theoretical and empirical work in -function reuse by providing some theoretical insights on the effectiveness of -function reuse when applied to the -learning with UCB-Hoeffding algorithm. Our main contribution is showing that in a specific case if -function reuse is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Data Stream Mining Techniques
