A Computational-Graph Partitioning Method for Training Memory-Constrained DNNs
Fareed Qararyah, Mohamed Wahib, Do\u{g}a Dikbay{\i}r, Mehmet Esat, Belviranli, Didem Unat

TL;DR
ParDNN is a generic, non-intrusive graph partitioning method that efficiently distributes large DNNs across multiple devices, enabling memory-constrained training with improved speed and scalability.
Contribution
It introduces a device-agnostic, automatic partitioning strategy for DNNs that optimizes memory usage and training time without modifying model or kernel implementations.
Findings
Successfully partitions billion-parameter models in seconds to minutes.
Achieves superlinear scaling in training throughput and batch size.
Outperforms existing partitioning methods in efficiency and scalability.
Abstract
Many state-of-the-art Deep Neural Networks (DNNs) have substantial memory requirements. Limited device memory becomes a bottleneck when training those models. We propose ParDNN, an automatic, generic, and non-intrusive partitioning strategy for DNNs that are represented as computational graphs. ParDNN decides a placement of DNN's underlying computational graph operations across multiple devices so that the devices' memory constraints are met and the training time is minimized. ParDNN is completely independent of the deep learning aspects of a DNN. It requires no modification neither at the model nor at the systems level implementation of its operation kernels. ParDNN partitions DNNs having billions of parameters and hundreds of thousands of operations in seconds to few minutes. Our experiments with TensorFlow on 16 GPUs demonstrate efficient training of 5 very large models while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
