Distributed TensorFlow with MPI
Abhinav Vishnu, Charles Siegel, Jeffrey Daily

TL;DR
This paper extends Google TensorFlow to large-scale clusters using MPI, enabling efficient distributed machine learning with minimal runtime modifications, and demonstrates its effectiveness on an InfiniBand cluster.
Contribution
It introduces a generic MPI-based extension to TensorFlow for distributed execution on large clusters with minimal changes to the runtime.
Findings
Efficient distributed execution demonstrated on an InfiniBand cluster.
Minimal modifications needed for TensorFlow to support MPI-based distributed computing.
Effective handling of large datasets with the extended TensorFlow implementation.
Abstract
Machine Learning and Data Mining (MLDM) algorithms are becoming increasingly important in analyzing large volume of data generated by simulations, experiments and mobile devices. With increasing data volume, distributed memory systems (such as tightly connected supercomputers or cloud computing systems) are becoming important in designing in-memory and massively parallel MLDM algorithms. Yet, the majority of open source MLDM software is limited to sequential execution with a few supporting multi-core/many-core execution. In this paper, we extend recently proposed Google TensorFlow for execution on large scale clusters using Message Passing Interface (MPI). Our approach requires minimal changes to the TensorFlow runtime -- making the proposed implementation generic and readily usable to increasingly large users of TensorFlow. We evaluate our implementation using an InfiniBand cluster…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
