GradMAP: Gradient-Based Multi-Agent Proximal Learning for Grid-Edge Flexibility
Yihong Zhou, Hongtai Zeng, and Thomas Morstyn

TL;DR
GradMAP is a decentralized, gradient-based learning method for grid-edge devices that efficiently trains neural policies respecting AC network physics, achieving fast convergence and low violations.
Contribution
It introduces a novel multi-agent proximal learning approach embedding differentiable power-flow models for decentralized control of grid devices.
Findings
Learns policies for 1,000 agents managing various devices within 15 minutes.
Achieves 3-5x faster training than existing benchmarks.
Delivers low operating costs and minimal constraint violations in tests.
Abstract
Coordinating large populations of grid-edge devices requires learning methods that remain fully decentralised in deployment while still respecting three-phase AC distribution-network physics. This paper proposes gradient-based multi-agent proximal learning (GradMAP) to address this challenge. GradMAP trains independent neural-network policies for each agent without any parameter sharing, and each agent uses only its own local observation for online decision-making without communication. During offline training, GradMAP embeds a differentiable three-phase AC power-flow model in a primal-dual learning loop and uses implicit differentiation to propagate exact network-constraint violations to update the policy parameters. To speed up training, GradMAP reuses expensive environment gradients through a proximal surrogate within a trust region defined in the more direct policy-output (action)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
