GradMAP: Gradient-Based Multi-Agent Proximal Learning for Grid-Edge Flexibility

Yihong Zhou; Hongtai Zeng; and Thomas Morstyn

arXiv:2604.24549·cs.LG·April 28, 2026

GradMAP: Gradient-Based Multi-Agent Proximal Learning for Grid-Edge Flexibility

Yihong Zhou, Hongtai Zeng, and Thomas Morstyn

PDF

TL;DR

GradMAP is a decentralized, gradient-based learning method for grid-edge devices that efficiently trains neural policies respecting AC network physics, achieving fast convergence and low violations.

Contribution

It introduces a novel multi-agent proximal learning approach embedding differentiable power-flow models for decentralized control of grid devices.

Findings

01

Learns policies for 1,000 agents managing various devices within 15 minutes.

02

Achieves 3-5x faster training than existing benchmarks.

03

Delivers low operating costs and minimal constraint violations in tests.

Abstract

Coordinating large populations of grid-edge devices requires learning methods that remain fully decentralised in deployment while still respecting three-phase AC distribution-network physics. This paper proposes gradient-based multi-agent proximal learning (GradMAP) to address this challenge. GradMAP trains independent neural-network policies for each agent without any parameter sharing, and each agent uses only its own local observation for online decision-making without communication. During offline training, GradMAP embeds a differentiable three-phase AC power-flow model in a primal-dual learning loop and uses implicit differentiation to propagate exact network-constraint violations to update the policy parameters. To speed up training, GradMAP reuses expensive environment gradients through a proximal surrogate within a trust region defined in the more direct policy-output (action)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.