Conformal Signal Temporal Logic for Robust Reinforcement Learning Control: A Case Study
Hani Beirami, M M Manjurul Islam

TL;DR
This paper introduces a conformal STL shield to enhance the safety and robustness of reinforcement learning controllers in aerospace, demonstrating improved reliability under challenging conditions using a case study with an F-16 simulator.
Contribution
It proposes a novel conformal STL shield that filters RL actions online, providing stronger robustness guarantees compared to classical rule-based shields in aerospace control tasks.
Findings
Conformal STL shield maintains STL satisfaction under stress scenarios.
The conformal shield achieves near baseline performance.
It significantly improves robustness over classical shields.
Abstract
We investigate how formal temporal logic specifications can enhance the safety and robustness of reinforcement learning (RL) control in aerospace applications. Using the open source AeroBench F-16 simulation benchmark, we train a Proximal Policy Optimization (PPO) agent to regulate engine throttle and track commanded airspeed. The control objective is encoded as a Signal Temporal Logic (STL) requirement to maintain airspeed within a prescribed band during the final seconds of each maneuver. To enforce this specification at run time, we introduce a conformal STL shield that filters the RL agent's actions using online conformal prediction. We compare three settings: (i) PPO baseline, (ii) PPO with a classical rule-based STL shield, and (iii) PPO with the proposed conformal shield, under both nominal conditions and a severe stress scenario involving aerodynamic model mismatch, actuator…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAerospace and Aviation Technology · Reinforcement Learning in Robotics · Adaptive Dynamic Programming Control
