Enhancing iteration performance on distributed task-based workflows
Alex Barcelo, Anna Queralt, Toni Cortes

TL;DR
This paper introduces SplIter, a mechanism that improves iteration performance in distributed task-based workflows by decoupling task granularity from data block size, leading to significant speedups.
Contribution
The paper proposes SplIter, a novel method to split collections into partitions without data transfer, enhancing performance and flexibility in task-based distributed programming models.
Findings
Achieves over tenfold performance improvements over baseline.
Effective across multiple applications and domains.
Compatible with frameworks like COMPSs and Dask.
Abstract
Task-based programming models have proven to be a robust and versatile way to approach development of applications for distributed environments. They provide natural programming patterns with high performance. However, execution on this paradigm can be very sensitive to granularity --i.e., the quantity and execution length of tasks. Granularity is often linked with the block size of the data, and finding the optimal block size has several challenges, as it requires inner knowledge of the computing environment. Our proposal is to supplement the task-based programming model with a new mechanism --our SplIter proposal. At its core, the SplIter provides a transparent way to split a collection into partitions (logical groups of blocks, obtained without any transfers nor data rearrangement), which can then be iterated. Tasks are linked to those partitions, which means that SplIter breaks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
