An Order-Aware Dataflow Model for Parallel Unix Pipelines
Shivam Handa (MIT), Konstantinos Kallas (University of Pennsylvania),, Nikos Vasilakis (MIT), Martin Rinard (MIT)

TL;DR
This paper introduces an order-aware dataflow model for parallel Unix pipelines, capturing complex semantics to enable correct and efficient parallelization of shell commands.
Contribution
It presents a novel order-aware dataflow model for Unix pipelines, formalizes translations, and implements a system that improves pipeline parallelization.
Findings
Proves correctness of data parallel transformations.
Achieves significant speedup on 47 real pipelines.
Formalizes translation between shell scripts and dataflow model.
Abstract
We present a dataflow model for modelling parallel Unix shell pipelines. To accurately capture the semantics of complex Unix pipelines, the dataflow model is order-aware, i.e., the order in which a node in the dataflow graph consumes inputs from different edges plays a central role in the semantics of the computation and therefore in the resulting parallelization. We use this model to capture the semantics of transformations that exploit data parallelism available in Unix shell computations and prove their correctness. We additionally formalize the translations from the Unix shell to the dataflow model and from the dataflow model back to a parallel shell script. We implement our model and transformations as the compiler and optimization passes of a system parallelizing shell pipelines, and use it to evaluate the speedup achieved on 47 pipelines.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Interconnection Networks and Systems
