Automatic Parallelization of Python Programs for Distributed Heterogeneous Computing
Jun Shirako, Akihiro Hayashi, Sri Raj Paul, Alexey Tumanov, Vivek, Sarkar

TL;DR
This paper presents an automated approach to transform sequential Python code into highly optimized parallel code for distributed heterogeneous systems, leveraging type hints, polyhedral optimizations, and Ray runtime, achieving massive performance gains.
Contribution
It introduces a novel AOT source-to-source transformation framework for Python that automates parallelization and optimization for heterogeneous distributed hardware.
Findings
Achieved over 20,000× speedup on supercomputers with 24 nodes and 144 GPUs.
Enabled automatic CPU/GPU code variant selection and high-level loop optimizations.
Demonstrated significant performance improvements in real-world applications.
Abstract
This paper introduces a novel approach to automatic ahead-of-time (AOT) parallelization and optimization of sequential Python programs for execution on distributed heterogeneous platforms. Our approach enables AOT source-to-source transformation of Python programs, driven by the inclusion of type hints for function parameters and return values. These hints can be supplied by the programmer or obtained by dynamic profiler tools; multi-version code generation guarantees the correctness of our AOT transformation in all cases. Our compilation framework performs automatic parallelization and sophisticated high-level code optimizations for the target distributed heterogeneous hardware platform. It includes extensions to the polyhedral framework that unify user-written loops and implicit loops present in matrix/tensor operators, as well as automated section of CPU vs. GPU code variants.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Computational Physics and Python Applications
