Can Q-Learning be Improved with Advice?

Noah Golowich; Ankur Moitra

arXiv:2110.13052·cs.LG·October 26, 2021

Can Q-Learning be Improved with Advice?

Noah Golowich, Ankur Moitra

PDF

Open Access

TL;DR

This paper investigates how incorporating predictions about the optimal Q-value function can improve regret bounds in reinforcement learning, especially when predictions satisfy a weak distillation condition, extending algorithms with predictions to complex RL settings.

Contribution

It introduces a method to leverage weakly accurate Q-value predictions to improve regret bounds in RL, and develops an algorithm that performs well even with arbitrary predictions.

Findings

01

Improved regret bounds when predictions satisfy the distillation condition.

02

The algorithm achieves sublinear regret with arbitrary predictions.

03

Extension of prediction-based algorithms from simple online problems to RL.

Abstract

Despite rapid progress in theoretical reinforcement learning (RL) over the last few years, most of the known guarantees are worst-case in nature, failing to take advantage of structure that may be known a priori about a given RL problem at hand. In this paper we address the question of whether worst-case lower bounds for regret in online learning of Markov decision processes (MDPs) can be circumvented when information about the MDP, in the form of predictions about its optimal $Q$ -value function, is given to the algorithm. We show that when the predictions about the optimal $Q$ -value function satisfy a reasonably weak condition we call distillation, then we can improve regret bounds by replacing the set of state-action pairs with the set of state-action pairs on which the predictions are grossly inaccurate. This improvement holds for both uniform regret bounds and gap-based ones.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management