The Effect of Q-function Reuse on the Total Regret of Tabular,   Model-Free, Reinforcement Learning

Volodymyr Tkachuk; Sriram Ganapathi Subramanian; Matthew E. Taylor

arXiv:2103.04416·cs.LG·March 9, 2021·1 cites

The Effect of Q-function Reuse on the Total Regret of Tabular, Model-Free, Reinforcement Learning

Volodymyr Tkachuk, Sriram Ganapathi Subramanian, Matthew E. Taylor

PDF

Open Access

TL;DR

This paper provides the first theoretical analysis of Q-function reuse in tabular, model-free reinforcement learning, demonstrating it can lead to regret bounds independent of state or action space size, supported by empirical evidence.

Contribution

It offers the first theoretical regret bounds for Q-function reuse in tabular, model-free RL, showing potential for significant sample complexity reduction.

Findings

01

Regret bound independent of state and action space size

02

Empirical results support theoretical insights

03

Q-function reuse improves learning efficiency

Abstract

Some reinforcement learning methods suffer from high sample complexity causing them to not be practical in real-world situations. $Q$ -function reuse, a transfer learning method, is one way to reduce the sample complexity of learning, potentially improving usefulness of existing algorithms. Prior work has shown the empirical effectiveness of $Q$ -function reuse for various environments when applied to model-free algorithms. To the best of our knowledge, there has been no theoretical work showing the regret of $Q$ -function reuse when applied to the tabular, model-free setting. We aim to bridge the gap between theoretical and empirical work in $Q$ -function reuse by providing some theoretical insights on the effectiveness of $Q$ -function reuse when applied to the $Q$ -learning with UCB-Hoeffding algorithm. Our main contribution is showing that in a specific case if $Q$ -function reuse is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Data Stream Mining Techniques