ZenFlow: Enabling Stall-Free Offloading Training via Asynchronous Updates

Tingfeng Lan; Yusen Wu; Bin Ma; Zhaoyuan Su; Rui Yang; Tekin Bicer; Masahiro Tanaka; Olatunji Ruwase; Dong Li; Yue Cheng

arXiv:2505.12242·cs.DC·August 6, 2025

ZenFlow: Enabling Stall-Free Offloading Training via Asynchronous Updates

Tingfeng Lan, Yusen Wu, Bin Ma, Zhaoyuan Su, Rui Yang, Tekin Bicer, Masahiro Tanaka, Olatunji Ruwase, Dong Li, Yue Cheng

PDF

Open Access

TL;DR

ZenFlow is a novel offloading framework for large language model training that prioritizes important parameters, enabling asynchronous updates and significantly reducing GPU stalls and PCIe traffic while maintaining model accuracy.

Contribution

ZenFlow introduces a parameter prioritization and asynchronous update mechanism that decouples GPU and CPU updates, improving training efficiency for large models.

Findings

01

Up to 5x end-to-end training speedup

02

2x reduction in PCIe traffic

03

Over 85% reduction in GPU stalls

Abstract

Fine-tuning large language models (LLMs) often exceeds GPU memory limits, prompting systems to offload model states to CPU memory. However, existing offloaded training frameworks like ZeRO-Offload treat all parameters equally and update the full model on the CPU, causing severe GPU stalls, where fast, expensive GPUs sit idle waiting for slow CPU updates and limited-bandwidth PCIe transfers. We present ZenFlow, a new offloading framework that prioritizes important parameters and decouples updates between GPU and CPU. ZenFlow performs in-place updates of important gradients on GPU, while asynchronously offloading and accumulating less important ones on CPU, fully overlapping CPU work with GPU computation. To scale across GPUs, ZenFlow introduces a lightweight gradient selection method that exploits a novel spatial and temporal locality property of important gradients, avoiding costly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsUAV Applications and Optimization · Distributed Control Multi-Agent Systems · IoT and Edge/Fog Computing

MethodsZeRO-Offload