Learning When to Act: Communication-Efficient Reinforcement Learning via Run-Time Assurance
Adam Haroon, Erick J. Rodr\'iguez-Seda, Cody Fleming, Tristan Schuler

TL;DR
This paper introduces a run-time assurance framework for reinforcement learning that adaptively determines when an agent should act, improving safety and efficiency in control tasks like inverted pendulum and quadrotor stabilization.
Contribution
It proposes a novel approach combining Lyapunov-based safety guarantees with adaptive timing decisions, enabling safer and more communication-efficient RL policies.
Findings
Learned policies increase mean inter-sample interval by up to 3.51× over baselines.
Fixed LQR controllers are unstable, highlighting the importance of adaptive timing.
Lyapunov reward transferability allows environment generalization without retraining.
Abstract
Safe reinforcement learning (RL) typically asks an agent should do. We ask it needs to act, and show that a single policy can jointly learn control inputs and communication-efficient timing decisions under a pointwise Lyapunov safety shield. We focus on stabilization around a known equilibrium, where CARE-based LQR backups, Lyapunov certificates, and classical Lyapunov-STC are well defined, enabling clean comparison against analytical baselines. A run-time assurance (RTA) layer overrides the policy via a one-step-ahead Lyapunov prediction and a precomputed LQR backup, providing a strictly stronger guarantee than constrained MDP methods that enforce safety only in expectation. On an inverted pendulum, cart--pole, and planar quadrotor, the learned policy achieves , , and higher mean inter-sample interval (MSI) than a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
