Quantum Policy Gradient in Reproducing Kernel Hilbert Space
David M. Bossens, Kishor Bharti, and Jayne Thompson

TL;DR
This paper introduces quantum kernel policies and quantum policy gradient algorithms for quantum reinforcement learning, demonstrating reduced query complexity and enhanced expressiveness in quantum environments.
Contribution
It extends kernel methods to quantum RL by proposing quantum policy gradient algorithms with both parametric and non-parametric policies, achieving quadratic query complexity reduction.
Findings
Quantum kernel policies enable efficient quantum RL in high-dimensional Hilbert spaces.
The proposed algorithms demonstrate quadratic reduction in query complexity compared to classical methods.
Actor-critic algorithms further reduce query complexity under certain conditions.
Abstract
Parametrised quantum circuits offer expressive and data-efficient representations for machine learning. Due to quantum states residing in a high-dimensional Hilbert space, parametrised quantum circuits have a natural interpretation in terms of kernel methods. The representation of quantum circuits in terms of quantum kernels has been studied widely in quantum supervised learning, but has been overlooked in the context of quantum RL. This paper proposes the use of kernel policies and quantum policy gradient algorithms for quantum-accessible environments. After discussing the properties of such policies and a demonstration of classical policy gradient on a coherent policy in a quantum environment, we propose parametric and non-parametric policy gradient and actor-critic algorithms with quantum kernel policies in quantum environments. This approach, implemented with both numerical and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsQuantum Computing Algorithms and Architecture
MethodsExperience Replay · Weight Decay · Batch Normalization · Dense Connections · Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Deep Deterministic Policy Gradient
