Trainability issues in quantum policy gradients
Andr\'e Sequeira, Luis Paulo Santos, Luis Soares Barbosa

TL;DR
This paper investigates the challenges of training quantum policy gradients in reinforcement learning, highlighting issues like barren plateaus and gradient explosion, and proposes conditions for trainability with empirical validation.
Contribution
It identifies key factors affecting trainability of quantum policies and provides conditions to ensure trainability with polynomial measurements, supported by empirical results.
Findings
Standard barren plateaus cause small gradients
Gradient explosion depends on basis-state partitioning
Contiguous partitioning enables trainability with polynomial measurements
Abstract
This research explores the trainability of Parameterized Quantum circuit-based policies in Reinforcement Learning, an area that has recently seen a surge in empirical exploration. While some studies suggest improved sample complexity using quantum gradient estimation, the efficient trainability of these policies remains an open question. Our findings reveal significant challenges, including standard Barren Plateaus with exponentially small gradients and gradient explosion. These phenomena depend on the type of basis-state partitioning and mapping these partitions onto actions. For a polynomial number of actions, a trainable window can be ensured with a polynomial number of measurements if a contiguous-like partitioning of basis-states is employed. These results are empirically validated in a multi-armed bandit environment.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsQuantum Computing Algorithms and Architecture · Quantum Mechanics and Applications
