Quantum Computing Provides Exponential Regret Improvement in Episodic Reinforcement Learning
Bhargav Ganguly, Yulian Wu, Di Wang, Vaneet Aggarwal

TL;DR
This paper introduces a quantum reinforcement learning algorithm that significantly reduces regret, achieving exponential improvement over classical methods by leveraging quantum mean estimation techniques.
Contribution
The paper proposes a novel quantum UCB-based framework for episodic RL that attains exponential regret reduction compared to classical algorithms.
Findings
Quantum algorithm achieves regret of in K episodes.
Quantum mean estimation provides quadratic sample efficiency.
Experiments demonstrate performance gains in RL environments.
Abstract
In this paper, we investigate the problem of \textit{episodic reinforcement learning} with quantum oracles for state evolution. To this end, we propose an \textit{Upper Confidence Bound} (UCB) based quantum algorithmic framework to facilitate learning of a finite-horizon MDP. Our quantum algorithm achieves an exponential improvement in regret as compared to the classical counterparts, achieving a regret of as compared to \footnote{ hides logarithmic terms.}, being the number of training episodes. In order to achieve this advantage, we exploit efficient quantum mean estimation technique that provides quadratic improvement in the number of i.i.d. samples needed to estimate the mean of sub-Gaussian random variables as compared to classical mean estimation. This improvement is a key to the significant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsQuantum Computing Algorithms and Architecture · Quantum Information and Cryptography · Neural Networks and Reservoir Computing
