Pipette: Automatic Fine-grained Large Language Model Training Configurator for Real-World Clusters
Jinkyu Yim, Jaeyong Song, Yerim Choi, Jaebeen Lee, Jaewon Jung,, Hongsun Jang, Jinho Lee

TL;DR
Pipette is an automatic configurator that optimizes large language model training on real-world GPU clusters by considering heterogeneity, communication, and memory constraints, leading to faster and feasible configurations.
Contribution
It introduces a fine-grained, performance-aware configuration method that accounts for real-world cluster heterogeneity and memory limits, improving over prior approaches.
Findings
Achieves significant speedup over previous methods.
Provides configurations that satisfy memory constraints.
Effectively models heterogeneous interconnect bandwidths.
Abstract
Training large language models (LLMs) is known to be challenging because of the huge computational and memory capacity requirements. To address these issues, it is common to use a cluster of GPUs with 3D parallelism, which splits a model along the data batch, pipeline stage, and intra-layer tensor dimensions. However, the use of 3D parallelism produces the additional challenge of finding the optimal number of ways on each dimension and mapping the split models onto the GPUs. Several previous studies have attempted to automatically find the optimal configuration, but many of these lacked several important aspects. For instance, the heterogeneous nature of the interconnect speeds is often ignored. While the peak bandwidths for the interconnects are usually made equal, the actual attained bandwidth varies per link in real-world clusters. Combined with the critical path modeling that does…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
