Exploration Policies for On-the-Fly Controller Synthesis: A Reinforcement Learning Approach
Tom\'as Delgado, Marco S\'anchez Sorondo, V\'ictor Braberman,, Sebasti\'an Uchitel

TL;DR
This paper introduces a reinforcement learning-based heuristic for on-the-fly controller synthesis, enabling efficient, generalized strategy generation in non-deterministic environments without exhaustive exploration.
Contribution
It proposes a novel RL approach with a modified DQN to learn heuristics that generalize to larger problem instances, improving over domain-independent heuristics.
Findings
RL-based heuristics outperform existing heuristics in unseen instances
Heuristics learned on small problems generalize to larger instances
The approach enables zero-shot policy transfer in controller synthesis
Abstract
Controller synthesis is in essence a case of model-based planning for non-deterministic environments in which plans (actually ''strategies'') are meant to preserve system goals indefinitely. In the case of supervisory control environments are specified as the parallel composition of state machines and valid strategies are required to be ''non-blocking'' (i.e., always enabling the environment to reach certain marked states) in addition to safe (i.e., keep the system within a safe zone). Recently, On-the-fly Directed Controller Synthesis techniques were proposed to avoid the exploration of the entire -and exponentially large-environment space, at the cost of non-maximal permissiveness, to either find a strategy or conclude that there is none. The incremental exploration of the plant is currently guided by a domain-independent human-designed heuristic. In this work, we propose a new method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFormal Methods in Verification · Adversarial Robustness in Machine Learning · Reinforcement Learning in Robotics
MethodsConvolution · Dense Connections · Q-Learning · Deep Q-Network
