TOAST: Fast and scalable auto-partitioning based on principled static analysis
Sami Alabed, Dominik Grewe, Norman Alexander Rink, Masha Samsikova, Timur Sitdikov, Agnieszka Swietlik, Dimitrios Vytiniotis, Daniel Belov

TL;DR
TOAST introduces a fast, scalable auto-partitioning system for large machine learning models by combining static analysis with Monte Carlo Tree Search, improving efficiency and solution quality over existing methods.
Contribution
It presents a novel static compiler analysis integrated with Monte Carlo Tree Search to efficiently explore partitioning options for large models.
Findings
Outperforms state-of-the-art industrial partitioners
Discovers previously unknown, superior partitioning solutions
Fully automated process for complex models
Abstract
Partitioning large machine learning models across distributed accelerator systems is a complex process, requiring a series of interdependent decisions that are further complicated by internal sharding ambiguities. Consequently, existing auto-partitioners often suffer from out-of-memory errors or are prohibitively slow when exploring the exponentially large space of possible partitionings. To mitigate this, they artificially restrict the search space, but this approach frequently yields infeasible solutions that violate device memory constraints or lead to sub-optimal performance. We propose a system that combines a novel static compiler analysis with a Monte Carlo Tree Search. Our analysis constructs an efficient decision space by identifying (i) tensor dimensions requiring identical sharding, and (ii) partitioning "conflicts" that require resolution. Our system significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
