Hierarchical Reinforcement Learning with Runtime Safety Shielding for Power Grid Operation

Gitesh Malik

arXiv:2604.14032·cs.AI·April 16, 2026

Hierarchical Reinforcement Learning with Runtime Safety Shielding for Power Grid Operation

Gitesh Malik

PDF

TL;DR

This paper introduces a hierarchical reinforcement learning framework with a runtime safety shield for power grid control, enhancing safety, robustness, and generalization without retraining.

Contribution

It proposes a novel hierarchical control architecture that decouples decision-making from safety enforcement, enabling safe and robust power grid operation in unseen scenarios.

Findings

01

Hierarchical approach outperforms flat RL under stress tests.

02

Safety shield ensures real-time safety invariants.

03

Zero-shot deployment achieves robust performance without retraining.

Abstract

Reinforcement learning has shown promise for automating power-grid operation tasks such as topology control and congestion management. However, its deployment in real-world power systems remains limited by strict safety requirements, brittleness under rare disturbances, and poor generalization to unseen grid topologies. In safety-critical infrastructure, catastrophic failures cannot be tolerated, and learning-based controllers must operate within hard physical constraints. This paper proposes a safety-constrained hierarchical control framework for power-grid operation that explicitly decouples long-horizon decision-making from real-time feasibility enforcement. A high-level reinforcement learning policy proposes abstract control actions, while a deterministic runtime safety shield filters unsafe actions using fast forward simulation. Safety is enforced as a runtime invariant,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.