Hydraulis: Balancing Large Transformer Model Training via Co-designing Parallel Strategies and Data Assignment

Haoyang Li; Fangcheng Fu; Sheng Lin; Hao Ge; Xuanyu Wang; Jiawen Niu; Jinbao Xue; Yangyu Tao; Di Wang; Jie Jiang; Bin Cui

arXiv:2412.07894·cs.DC·October 16, 2025

Hydraulis: Balancing Large Transformer Model Training via Co-designing Parallel Strategies and Data Assignment

Haoyang Li, Fangcheng Fu, Sheng Lin, Hao Ge, Xuanyu Wang, Jiawen Niu, Jinbao Xue, Yangyu Tao, Di Wang, Jie Jiang, Bin Cui

PDF

Open Access

TL;DR

Hydraulis is a system that improves large Transformer training efficiency by co-optimizing parallel strategies and data assignment to address workload imbalances caused by data sampling and packing issues.

Contribution

It introduces a dynamic heterogeneous parallel strategy and a two-stage data assignment approach to balance training workloads in large Transformer models.

Findings

01

Hydraulis outperforms existing systems by 1.32-2.66 times.

02

Effectively mitigates data sampling and packing imbalances.

03

Enhances training efficiency for large Transformer models.

Abstract

To optimize large Transformer model training, both efficient parallel computing and advanced data management are indispensable. However, current methods often assume a stable and uniform training workload, neglecting data-induced imbalances-arising from both sampling and packing processes-which can impede training performance. Specifically, data sampling imbalance arises from uneven sequence length distribution of the training data, while data packing imbalance stems from the discrepancy between the linear memory complexity and quadratic time complexity of the attention mechanism. To address these imbalance issues, we develop Hydraulis, which jointly optimizes the parallel strategies and data assignment. For one thing, we introduce large model training with dynamic heterogeneous parallel strategies in response to the sequence length variations within and across training iterations. For…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPower Line Inspection Robots · Advanced Neural Network Applications · Oil and Gas Production Techniques

MethodsAttention Is All You Need · Adam · Dropout · Position-Wise Feed-Forward Layer · Softmax · Dense Connections · Byte Pair Encoding · Linear Layer · Multi-Head Attention · Label Smoothing