EasyRider: Mitigating Power Transients in Datacenter-Scale Training Workloads
Dillon Jensen, Obi Nnorom Jr., Grant Wilkins, Hugo Budd, Ram Rajagopal, Juan Rivas-Davila, and Phil Levis

TL;DR
EasyRider is a power management system designed to smooth out rapid power fluctuations in datacenter GPU workloads, protecting grid infrastructure without modifying training software.
Contribution
It introduces a novel power architecture combining passive components and active energy storage to mitigate power swings during AI training.
Findings
Effectively reduces power ramp rates within grid safety limits.
Works across diverse workload power profiles and hardware setups.
Prolongs energy storage system lifetime with intelligent control.
Abstract
Large-scale AI model training workloads use thousands of GPUs operating in tightly synchronized loops. During synchronous communication, start-up, shut-down, and checkpointing, GPU power consumption can swing from peak to idle within milliseconds. These large and rapid load swings endanger grid infrastructure as they induce steep power ramp rates, voltage and frequency shifts, and reactive power transients that can damage transformers, converters, and protection equipment. To solve this problem, we introduce EasyRider, a power architecture to mitigate power fluctuations at the rack level. EasyRider uses passive components and actively-controlled auxiliary energy storage to attenuate rack power swings. A software system continually monitors the energy storage system to maximize its lifetime in the presence of frequent charge/discharge cycles. EasyRider filters rack power variations to be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
