Accelerating Quantum Reinforcement Learning with a Quantum Natural Policy Gradient Based Approach
Yang Xu, Vaneet Aggarwal

TL;DR
This paper introduces a Quantum Natural Policy Gradient algorithm that accelerates quantum reinforcement learning by reducing sample complexity and integrating deterministic gradient estimation into quantum systems.
Contribution
The paper proposes a novel QNPG algorithm that replaces stochastic sampling with deterministic estimation, improving efficiency in quantum reinforcement learning.
Findings
Achieves a sample complexity of ~O(ε^{-1.5}) for quantum oracle queries.
Significantly outperforms classical lower bounds of ~O(ε^{-2}).
Demonstrates effective integration of deterministic gradient estimation in quantum settings.
Abstract
We address the problem of quantum reinforcement learning (QRL) under model-free settings with quantum oracle access to the Markov Decision Process (MDP). This paper introduces a Quantum Natural Policy Gradient (QNPG) algorithm, which replaces the random sampling used in classical Natural Policy Gradient (NPG) estimators with a deterministic gradient estimation approach, enabling seamless integration into quantum systems. While this modification introduces a bounded bias in the estimator, the bias decays exponentially with increasing truncation levels. This paper demonstrates that the proposed QNPG algorithm achieves a sample complexity of for queries to the quantum oracle, significantly improving the classical lower bound of for queries to the MDP.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsQuantum Computing Algorithms and Architecture · Quantum Information and Cryptography
