AsyncHZP: Hierarchical ZeRO Parallelism with Asynchronous Scheduling for Scalable LLM Training

Huawei Bai; Yifan Huang; Wenqi Shi; Ansheng You; Feifan Shao; Tengfei Han; Minghui Yu

arXiv:2510.20111·cs.DC·October 24, 2025

AsyncHZP: Hierarchical ZeRO Parallelism with Asynchronous Scheduling for Scalable LLM Training

Huawei Bai, Yifan Huang, Wenqi Shi, Ansheng You, Feifan Shao, Tengfei Han, Minghui Yu

PDF

Open Access

TL;DR

AsyncHZP introduces an asynchronous hierarchical ZeRO parallelism method that adaptively reshards parameters and overlaps communication with computation, significantly improving scalability and efficiency in large-scale language model training.

Contribution

It proposes a novel asynchronous ZeRO variant with adaptive resharing and multi-stream scheduling, reducing communication overhead and simplifying large-scale LLM training.

Findings

01

Outperforms classic ND parallelism in efficiency and scalability

02

Maintains stability at large scale for Dense and MoE models

03

Achieves state-of-the-art performance with less tuning

Abstract

The training efficiency and scalability of language models on massive clusters currently remain a critical bottleneck. Mainstream approaches like ND parallelism are often cumbersome and complex, while flexible alternatives such as the Zero Redundancy Optimizer (ZeRO) are frequently hampered by communication overhead. In this paper, we propose Asynchronous Hierarchical Zero Parallelism (AsyncHZP), a novel asynchronous variant of ZeRO designed to achieve superior performance while maintaining simplicity and memory efficiency. Unlike traditional ZeRO, which employs over-fine-grained sharding that can lead to inefficient communication, AsyncHZP adaptively reshards parameters, gradients, and optimizer states across different replica groups. This strategy optimizes device memory utilization and significantly reduces communication overhead. In addition, we also design a multi-stream…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Big Data and Digital Economy