QuantFPFlow: Quantum Amplitude Estimation for Fokker--Planck Policy Optimisation in Continuous Reinforcement Learning
Abraham Itzhak Weinberg

TL;DR
QuantFPFlow introduces a quantum-inspired reinforcement learning framework that leverages amplitude estimation for efficient policy optimization, achieving quadratic speedup and improved exploration in continuous control tasks.
Contribution
It presents a novel quantum-inspired method integrating amplitude estimation into Fokker--Planck RL, enhancing efficiency and exploration in continuous reinforcement learning.
Findings
Achieves quadratic speedup in estimating the FP partition function.
Outperforms Soft Actor-Critic in a multimodal reward task.
Scales more efficiently with dimensionality compared to classical methods.
Abstract
We introduce \textbf{QuantFPFlow}, a reinforcement learning framework that integrates quantum amplitude estimation into the Fokker--Planck~(FP) formulation of stochastic policy optimisation. Classical continuous-space RL agents must estimate the FP partition function at cost ; QuantFPFlow replaces this with a Grover-amplified amplitude estimator achieving -- a provable quadratic speedup. While the full quantum acceleration requires fault-tolerant hardware, the quantum-inspired classical simulation demonstrated here already exhibits the algorithmic structure. The estimated stationary distribution drives a theoretically grounded exploration bonus . This bonus steers the agent toward globally optimal regions of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
