Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret
Han Zhong, Jiachen Hu, Yecheng Xue, Tongyang Li, Liwei Wang

TL;DR
This paper introduces the first quantum reinforcement learning algorithms with provably logarithmic worst-case regret, significantly improving exploration efficiency over classical methods.
Contribution
It develops the first provably efficient quantum RL algorithms for tabular and linear function approximation settings, breaking classical regret barriers.
Findings
Achieves logarithmic worst-case regret in quantum RL for tabular MDPs.
Extends results to linear function approximation with polynomial regret.
Introduces novel lazy updating and quantum estimation techniques.
Abstract
While quantum reinforcement learning (RL) has attracted a surge of attention recently, its theoretical understanding is limited. In particular, it remains elusive how to design provably efficient quantum RL algorithms that can address the exploration-exploitation trade-off. To this end, we propose a novel UCRL-style algorithm that takes advantage of quantum computing for tabular Markov decision processes (MDPs) with states, actions, and horizon , and establish an worst-case regret for it, where is the number of episodes. Furthermore, we extend our results to quantum RL with linear function approximation, which is capable of handling problems with large state spaces. Specifically, we develop a quantum algorithm based on value target regression (VTR) for linear mixture MDPs with -dimensional linear representation and prove…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsQuantum Computing Algorithms and Architecture · Quantum Information and Cryptography · Quantum Mechanics and Applications
