Replay-buffer engineering for noise-robust quantum circuit optimization
Akash Kundu, Sebastian Feld

TL;DR
This paper introduces novel replay-buffer strategies and evaluation methods to enhance noise-robust quantum circuit optimization using deep reinforcement learning, achieving significant efficiency and accuracy improvements.
Contribution
It presents ReaPER+ for improved replay sampling, OptCRLQAS for faster evaluations, and a transfer scheme for noisy settings, advancing scalable quantum optimization techniques.
Findings
ReaPER+ achieves 4-32x sample efficiency gains.
OptCRLQAS cuts evaluation time by up to 67.5%.
Transfer scheme reduces steps to chemical accuracy by 85-90%.
Abstract
Deep reinforcement learning (RL) for quantum circuit optimization faces three fundamental bottlenecks: replay buffers that ignore the reliability of temporal-difference (TD) targets, curriculum-based architecture search that triggers a full quantum-classical evaluation at every environment step, and the routine discard of noiseless trajectories when retraining under hardware noise. We address all three by treating the replay buffer as a primary algorithmic lever for quantum optimization. We introduce ReaPER, an annealed replay rule that transitions from TD error-driven prioritization early in training to reliability-aware sampling as value estimates mature, achieving gains in sample efficiency over fixed PER, ReaPER, and uniform replay while consistently discovering more compact circuits across quantum compilation and QAS benchmarks; validation on LunarLander-v3 confirms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
