TOAST: Fast and scalable auto-partitioning based on principled static analysis

Sami Alabed; Dominik Grewe; Norman Alexander Rink; Masha Samsikova; Timur Sitdikov; Agnieszka Swietlik; Dimitrios Vytiniotis; Daniel Belov

arXiv:2508.15010·cs.LG·August 26, 2025

TOAST: Fast and scalable auto-partitioning based on principled static analysis

Sami Alabed, Dominik Grewe, Norman Alexander Rink, Masha Samsikova, Timur Sitdikov, Agnieszka Swietlik, Dimitrios Vytiniotis, Daniel Belov

PDF

TL;DR

TOAST introduces a fast, scalable auto-partitioning system for large machine learning models by combining static analysis with Monte Carlo Tree Search, improving efficiency and solution quality over existing methods.

Contribution

It presents a novel static compiler analysis integrated with Monte Carlo Tree Search to efficiently explore partitioning options for large models.

Findings

01

Outperforms state-of-the-art industrial partitioners

02

Discovers previously unknown, superior partitioning solutions

03

Fully automated process for complex models

Abstract

Partitioning large machine learning models across distributed accelerator systems is a complex process, requiring a series of interdependent decisions that are further complicated by internal sharding ambiguities. Consequently, existing auto-partitioners often suffer from out-of-memory errors or are prohibitively slow when exploring the exponentially large space of possible partitionings. To mitigate this, they artificially restrict the search space, but this approach frequently yields infeasible solutions that violate device memory constraints or lead to sub-optimal performance. We propose a system that combines a novel static compiler analysis with a Monte Carlo Tree Search. Our analysis constructs an efficient decision space by identifying (i) tensor dimensions requiring identical sharding, and (ii) partitioning "conflicts" that require resolution. Our system significantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.