Rethinking Dynamic Networks and Heterogeneous Computing with Automatic Parallelization
Ruilong Wu, Xinjiao Li, Yisu Wang, Xinyu Chen, Dirk Kutscher

TL;DR
This paper presents a novel approach for automatic parallelization in heterogeneous and dynamic network environments, improving training efficiency for large language models by modeling heterogeneity, simulating configurations, and pruning strategies.
Contribution
It introduces a simulation-based framework that considers node heterogeneity and network dynamics, enabling optimized workload distribution and faster configuration search.
Findings
Enhanced training performance on heterogeneous nodes
Improved adaptability in dynamic network scenarios
Significant reduction in search time through pruning
Abstract
Hybrid parallelism techniques are essential for efficiently training large language models (LLMs). Nevertheless, current automatic parallel planning frameworks often overlook the simultaneous consideration of node heterogeneity and dynamic network topology changes, limiting their effectiveness in practical applications. In this paper, we address these limitations by modeling heterogeneous nodes within dynamically changing network environments and leveraging simulation-based strategies to determine optimal parallel configurations. Our approach enables fine-grained workload allocation tailored for heterogeneous nodes and complex network scenarios, achieving performance competitive with state-of-the-art methods under regular and stable network conditions. Additionally, we introduce a strategy pruning technique to rapidly discard infeasible parallel configurations, substantially reducing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Neural Networks and Applications · Distributed and Parallel Computing Systems
MethodsPruning
