Model-free reinforcement learning with noisy actions for automated experimental control in optics
Lea Richtmann, Viktoria-S. Schmiesing, Dennis Wilken, Jan Heine, Aaron Tranter, Avishek Anand, Tobias J. Osborne, Mich\`ele Heurs

TL;DR
This paper demonstrates that model-free reinforcement learning can efficiently control complex optical systems directly through experiments, achieving high coupling efficiency faster than humans and reducing the need for detailed system modeling.
Contribution
It introduces a model-free RL approach using sample-efficient algorithms for optical system control, outperforming human experts in speed and efficiency without relying on system simulations.
Findings
RL agents achieve 90% coupling efficiency.
CrossQ outperforms other RL algorithms in speed and training time.
Direct experimental training can replace detailed system modeling.
Abstract
Setting up and controlling optical systems is often a challenging and tedious task. The high number of degrees of freedom to control mirrors, lenses, or phases of light makes automatic control challenging, especially when the complexity of the system cannot be adequately modeled due to noise or non-linearities. Here, we show that reinforcement learning (RL) can overcome these challenges when coupling laser light into an optical fiber, using a model-free RL approach that trains directly on the experiment without pre-training on simulations. By utilizing the sample-efficient algorithms Soft Actor-Critic (SAC), Truncated Quantile Critics (TQC), or CrossQ, our agents learn to couple with 90% efficiency. A human expert reaches this efficiency, but the RL agents are quicker. In particular, the CrossQ agent outperforms the other agents in coupling speed while requiring only half the training…
Peer Reviews
Decision·Submitted to ICLR 2025
- The authors have rigorously designed the experiments, performing method selection in simulation and experimentation on hardware. - The presentation is very clear. The paper covers the problem of interest, potential challenges, design choices, experiment designs, and results in great detail. - The main experiment results are convincing: the RL agent can overcome action stochasticity and perform from the limited observations to solve the optical fiber coupling problem above the human level.
- It is nice to see RL applied to a physical control system and have utilities for research in another domain. However, the control problem solved in the work is relatively simple, especially when the robotics community has trained RL policies on higher degrees of freedom systems, even from image observations. - I sense there's not much representation-learning in this project. I think the insights will be more relevant if the authors can scale the experiment up to higher-dimensional observations
Originality: In optical systems, where simulation is difficult due to noise and system complexity, this research uses model-free reinforcement learning in an area where traditional models fall short. The RL agent's ability to attain high coupling efficiencies directly on physical hardware, matching or surpassing human performance, is demonstrated by rigorous experimental results. Significance: The effective use of RL in this context may encourage the development of comparable methods in other
Limited Generalization: Although the RL agent performs admirably, it is designed especially for this particular experimental configuration. The significance of the paper would be enhanced by greater generalizability or adaptability to different optical settings. Training Time: The practical implementation of this strategy in time-sensitive settings may be limited by the time needed to reach high efficiency, which can take up to four days. Dependency on Particular Hardware: The configuration is
- good exposition of the problem being solved and the associated challenges - extensive experiments demonstrating the feasibility of the application - detailed description of the experimental setup. Without a background in optics experimentation I got the feeling that I could reproduce the experiments to some degree if required to. The paper contains thorough descriptions of design choices regarding the actions pace, processing of observations, formulation of the reward function, reset procedure
The only reserve I have is **whether ICLR is a good venue for this submission**. This contribution demonstrates that off-the-shelf RL algorithms (literally StableBaseline implementations) can be used for automating parts of the calibration and setup of optics experiments in the presence of noisy actuators. This in itself is not surprising nor novel. DRL algorithms have been shown to perform control in real-life experiments with noisy actuators, partial observability and, in addition to this wor
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Optical Sensing Technologies · Semiconductor Lasers and Optical Devices · Iterative Learning Control Systems
