Quantum Policy Gradient in Reproducing Kernel Hilbert Space

David M. Bossens; Kishor Bharti; and Jayne Thompson

arXiv:2411.06650·quant-ph·August 12, 2025

Quantum Policy Gradient in Reproducing Kernel Hilbert Space

David M. Bossens, Kishor Bharti, and Jayne Thompson

PDF

Open Access

TL;DR

This paper introduces quantum kernel policies and quantum policy gradient algorithms for quantum reinforcement learning, demonstrating reduced query complexity and enhanced expressiveness in quantum environments.

Contribution

It extends kernel methods to quantum RL by proposing quantum policy gradient algorithms with both parametric and non-parametric policies, achieving quadratic query complexity reduction.

Findings

01

Quantum kernel policies enable efficient quantum RL in high-dimensional Hilbert spaces.

02

The proposed algorithms demonstrate quadratic reduction in query complexity compared to classical methods.

03

Actor-critic algorithms further reduce query complexity under certain conditions.

Abstract

Parametrised quantum circuits offer expressive and data-efficient representations for machine learning. Due to quantum states residing in a high-dimensional Hilbert space, parametrised quantum circuits have a natural interpretation in terms of kernel methods. The representation of quantum circuits in terms of quantum kernels has been studied widely in quantum supervised learning, but has been overlooked in the context of quantum RL. This paper proposes the use of kernel policies and quantum policy gradient algorithms for quantum-accessible environments. After discussing the properties of such policies and a demonstration of classical policy gradient on a coherent policy in a quantum environment, we propose parametric and non-parametric policy gradient and actor-critic algorithms with quantum kernel policies in quantum environments. This approach, implemented with both numerical and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsQuantum Computing Algorithms and Architecture

MethodsExperience Replay · Weight Decay · Batch Normalization · Dense Connections · Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Deep Deterministic Policy Gradient