Towards a Scalable and Distributed Infrastructure for Deep Learning   Applications

Bita Hasheminezhad; Shahrzad Shirzad; Nanmiao Wu; Patrick Diehl,; Hannes Schulz; Hartmut Kaiser

arXiv:2010.03012·cs.DC·April 21, 2021

Towards a Scalable and Distributed Infrastructure for Deep Learning Applications

Bita Hasheminezhad, Shahrzad Shirzad, Nanmiao Wu, Patrick Diehl,, Hannes Schulz, Hartmut Kaiser

PDF

1 Repo

TL;DR

This paper introduces Phylanx, a scalable distributed deep learning infrastructure that translates Python code into efficient multi-node execution using fine-grained parallelism and task-based runtime systems.

Contribution

It presents a novel framework that enhances distributed deep learning by enabling fine-grained inter-node communication and efficient execution of Python code.

Findings

01

Phylanx improves scalability for deep learning workloads.

02

It enables efficient fine-grained communication across nodes.

03

The framework leverages C++ parallelism and concurrency libraries.

Abstract

Although recent scaling up approaches to training deep neural networks have proven to be effective, the computational intensity of large and complex models, as well as the availability of large-scale datasets, require deep learning frameworks to utilize scaling out techniques. Parallelization approaches and distribution requirements are not considered in the preliminary designs of most available distributed deep learning frameworks, and most of them still are not able to perform effective and efficient fine-grained inter-node communication. We present Phylanx that has the potential to alleviate these shortcomings. Phylanx offers a productivity-oriented frontend where user Python code is translated to a futurized execution tree that can be executed efficiently on multiple nodes using the C++ standard library for parallelism and concurrency (HPX), leveraging fine-grained threading and an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

STEllAR-GROUP/phylanx
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.