Dissecting Quantum Reinforcement Learning: A Systematic Evaluation of Key Components
Javier Lazaro, Juan-Ignacio Vazquez, Pablo Garcia-Bringas

TL;DR
This paper systematically evaluates key components of quantum reinforcement learning architectures, revealing how data embedding, entanglement, and post-processing influence training stability and performance.
Contribution
It provides the first controlled empirical analysis of PQC-based QRL components, establishing a reproducible benchmarking framework.
Findings
Output Reuse (OR) exhibits unique behavior in hybrid pipelines.
Data Reuploading (DR) enhances trainability and stability.
Stronger entanglement can impair optimization.
Abstract
Parameterised quantum circuit (PQC) based Quantum Reinforcement Learning (QRL) has emerged as a promising paradigm at the intersection of quantum computing and reinforcement learning (RL). By design, PQCs create hybrid quantum-classical models, but their practical applicability remains uncertain due to training instabilities, barren plateaus (BPs), and the difficulty of isolating the contribution of individual pipeline components. In this work, we dissect PQC based QRL architectures through a systematic experimental evaluation of three aspects recurrently identified as critical: (i) data embedding strategies, with Data Reuploading (DR) as an advanced approach; (ii) ansatz design, particularly the role of entanglement; and (iii) post-processing blocks after quantum measurement, with a focus on the underexplored Output Reuse (OR) technique. Using a unified PPO-CartPole framework, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsQuantum Computing Algorithms and Architecture · Quantum Information and Cryptography · Quantum-Dot Cellular Automata
