Baechi: Fast Device Placement of Machine Learning Graphs
Beomyeol Jeon, Linda Cai, Chirag Shetty, Pallavi Srivastava, Jintao, Jiang, Xiaolan Ke, Yitao Meng, Cong Xie, Indranil Gupta

TL;DR
Baechi introduces an algorithmic system for device placement in machine learning graphs, achieving faster placement planning and comparable training performance on memory-limited devices, outperforming existing learning-based methods.
Contribution
Baechi is the first to apply an algorithmic approach to device placement for ML training graphs on small clusters, providing provable bounds and significant speed improvements.
Findings
Placement planning is up to 654,000 times faster than existing methods.
Training step times are comparable to expert placements, within 6.2%.
Algorithms are mathematically within a constant factor of optimal.
Abstract
Machine Learning graphs (or models) can be challenging or impossible to train when either devices have limited memory, or models are large. To split the model across devices, learning-based approaches are still popular. While these result in model placements that train fast on data (i.e., low step times), learning-based model-parallelism is time-consuming, taking many hours or days to create a placement plan of operators on devices. We present the Baechi system, the first to adopt an algorithmic approach to the placement problem for running machine learning training graphs on small clusters of memory-constrained devices. We integrate our implementation of Baechi into two popular open-source learning frameworks: TensorFlow and PyTorch. Our experimental results using GPUs show that: (i) Baechi generates placement plans 654 X - 206K X faster than state-of-the-art learning-based approaches,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Ferroelectric and Negative Capacitance Devices · Advanced Graph Neural Networks
