PaSh: Light-touch Data-Parallel Shell Processing
Nikos Vasilakis (MIT), Konstantinos Kallas (University of, Pennsylvania), Konstantinos Mamouras (Rice University), Achilleas, Benetopoulos (Unaffiliated), Lazar Cvetkovi\'c (University of Belgrade)

TL;DR
PaSh is a system that automatically parallelizes POSIX shell scripts by transforming them into dataflow graphs, exposing parallelism, and converting back into optimized scripts, resulting in significant speedups.
Contribution
It introduces a novel approach combining program transformations, annotations, and runtime primitives to parallelize shell scripts efficiently.
Findings
Achieved speedups up to 61.1x on real scripts
Developed an annotation language for parallelizability properties
Guided optimizations with a comprehensive command parallelizability study
Abstract
This paper presents {\scshape PaSh}, a system for parallelizing POSIX shell scripts. Given a script, {\scshape PaSh} converts it to a dataflow graph, performs a series of semantics-preserving program transformations that expose parallelism, and then converts the dataflow graph back into a script -- one that adds POSIX constructs to explicitly guide parallelism coupled with {\scshape PaSh}-provided {\scshape Unix}-aware runtime primitives for addressing performance- and correctness-related issues. A lightweight annotation language allows command developers to express key parallelizability properties about their commands. An accompanying parallelizability study of POSIX and GNU commands -- two large and commonly used groups -- guides the annotation language and optimized aggregator library that {\scshape PaSh} uses. Finally, {\scshape PaSh}'s {\scshape PaSh}'s extensive evaluation over 44…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
