Can Q-Learning be Improved with Advice?
Noah Golowich, Ankur Moitra

TL;DR
This paper investigates how incorporating predictions about the optimal Q-value function can improve regret bounds in reinforcement learning, especially when predictions satisfy a weak distillation condition, extending algorithms with predictions to complex RL settings.
Contribution
It introduces a method to leverage weakly accurate Q-value predictions to improve regret bounds in RL, and develops an algorithm that performs well even with arbitrary predictions.
Findings
Improved regret bounds when predictions satisfy the distillation condition.
The algorithm achieves sublinear regret with arbitrary predictions.
Extension of prediction-based algorithms from simple online problems to RL.
Abstract
Despite rapid progress in theoretical reinforcement learning (RL) over the last few years, most of the known guarantees are worst-case in nature, failing to take advantage of structure that may be known a priori about a given RL problem at hand. In this paper we address the question of whether worst-case lower bounds for regret in online learning of Markov decision processes (MDPs) can be circumvented when information about the MDP, in the form of predictions about its optimal -value function, is given to the algorithm. We show that when the predictions about the optimal -value function satisfy a reasonably weak condition we call distillation, then we can improve regret bounds by replacing the set of state-action pairs with the set of state-action pairs on which the predictions are grossly inaccurate. This improvement holds for both uniform regret bounds and gap-based ones.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management
