Computationally Efficient Safe Reinforcement Learning for Power Systems
Daniel Tabas, Baosen Zhang

TL;DR
This paper introduces a computationally efficient safe reinforcement learning method for power system frequency regulation, ensuring safety constraints are met without real-time optimization, and demonstrates its effectiveness on a 9-bus system.
Contribution
It develops a novel, closed-form safety filter using set-theoretic control for safe RL, applicable with any policy gradient algorithm, improving safety and efficiency in power system control.
Findings
The safety filter guarantees safety constraints without real-time optimization.
The RL policy outperforms traditional control methods in cost-effectiveness.
The approach surpasses penalty-based constrained RL methods in safety and performance.
Abstract
We propose a computationally efficient approach to safe reinforcement learning (RL) for frequency regulation in power systems with high levels of variable renewable energy resources. The approach draws on set-theoretic control techniques to craft a neural network-based control policy that is guaranteed to satisfy safety-critical state constraints, without needing to solve a model predictive control or projection problem in real time. By exploiting the properties of robust controlled-invariant polytopes, we construct a novel, closed-form "safety-filter" that enables end-to-end safe learning using any policy gradient-based RL algorithm. We then apply the safety filter in conjunction with the deep deterministic policy gradient (DDPG) algorithm to regulate frequency in a modified 9-bus power system, and show that the learned policy is more cost-effective than robust linear feedback control…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPower System Optimization and Stability · Optimal Power Flow Distribution · Smart Grid Security and Resilience
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Adam · Dense Connections · Experience Replay · Batch Normalization · Weight Decay · Convolution · Deep Deterministic Policy Gradient
