Stepping Out of the Shadows: Reinforcement Learning in Shadow Mode
Philipp Gassert, Matthias Althoff

TL;DR
This paper introduces a shadow mode reinforcement learning approach that trains agents alongside conventional controllers to improve performance in cyber-physical systems while minimizing risks and training time.
Contribution
The novel shadow mode training method allows RL agents to learn from existing controllers, reducing risk and training time in physical systems with no simulation models.
Findings
Effective training on reach-avoid task where standard methods fail
Low regret during training by combining RL and conventional control
Improved system performance over traditional controllers
Abstract
Reinforcement learning (RL) is not yet competitive for many cyber-physical systems, such as robotics, process automation, and power systems, as training on a system with physical components cannot be accelerated, and simulation models do not exist or suffer from a large simulation-to-reality gap. During the long training time, expensive equipment cannot be used and might even be damaged due to inappropriate actions of the reinforcement learning agent. Our novel approach addresses exactly this problem: We train the reinforcement agent in a so-called shadow mode with the assistance of an existing conventional controller, which does not have to be trained and instantaneously performs reasonably well. In shadow mode, the agent relies on the controller to provide action samples and guidance towards favourable states to learn the task, while simultaneously estimating for which states the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Systems and Decision Making · Cognitive Science and Mapping
