Taking gradients through experiments: LSTMs and memory proximal policy optimization for black-box quantum control
Moritz August, Jos\'e Miguel Hern\'andez-Lobato

TL;DR
This paper introduces a novel reinforcement learning approach using LSTM networks and a new variant of PPO, called MPPO, to optimize black-box quantum control tasks, achieving state-of-the-art results.
Contribution
The paper proposes a new method, MPPO, tailored for quantum control problems, integrating LSTM-based policy gradients for improved performance.
Findings
Achieves state-of-the-art results in quantum control tasks.
Demonstrates effectiveness of LSTM-based policies in black-box quantum control.
Introduces a new variant of PPO, MPPO, for quantum reinforcement learning.
Abstract
In this work we introduce the application of black-box quantum control as an interesting rein- forcement learning problem to the machine learning community. We analyze the structure of the reinforcement learning problems arising in quantum physics and argue that agents parameterized by long short-term memory (LSTM) networks trained via stochastic policy gradients yield a general method to solving them. In this context we introduce a variant of the proximal policy optimization (PPO) algorithm called the memory proximal policy optimization (MPPO) which is based on this analysis. We then show how it can be applied to specific learning tasks and present results of nu- merical experiments showing that our method achieves state-of-the-art results for several learning tasks in quantum control with discrete and continouous control parameters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
