Automatic Discovery of Composite SPMD Partitioning Strategies in PartIR

Sami Alabed; Dominik Grewe; Juliana Franco; Bart Chrzaszcz; Tom Natan,; Tamara Norman; Norman A. Rink; Dimitrios Vytiniotis; Michael Schaarschmidt

arXiv:2210.06352·cs.DC·October 20, 2022·1 cites

Automatic Discovery of Composite SPMD Partitioning Strategies in PartIR

Sami Alabed, Dominik Grewe, Juliana Franco, Bart Chrzaszcz, Tom Natan,, Tamara Norman, Norman A. Rink, Dimitrios Vytiniotis, Michael Schaarschmidt

PDF

Open Access

TL;DR

This paper introduces an automatic partitioner that uses Monte Carlo Tree Search and compiler analysis to identify optimal SPMD strategies for large neural network training, reducing manual effort.

Contribution

It presents a novel goal-oriented search method that automatically discovers effective composite SPMD partitioning strategies for neural networks.

Findings

01

The partitioner matches expert-level strategies across various models.

02

Monte Carlo Tree Search effectively explores partitioning options.

03

Compiler analysis guides the search towards efficient solutions.

Abstract

Large neural network models are commonly trained through a combination of advanced parallelism strategies in a single program, multiple data (SPMD) paradigm. For example, training large transformer models requires combining data, model, and pipeline partitioning; and optimizer sharding techniques. However, identifying efficient combinations for many model architectures and accelerator systems requires significant manual analysis. In this work, we present an automatic partitioner that identifies these combinations through a goal-oriented search. Our key findings are that a Monte Carlo Tree Search-based partitioner leveraging partition-specific compiler analysis directly into the search and guided goals matches expert-level strategies for various models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Machine Learning in Materials Science