A Programming Model for Hybrid Workflows: combining Task-based Workflows and Dataflows all-in-one
Cristian Ramon-Cortes, Francesc Lordan, Jorge Ejarque, Rosa M. Badia

TL;DR
This paper introduces a unified programming model that combines task-based workflows and dataflows, simplifying the development of complex e-Science applications involving simulations and high-performance data analytics.
Contribution
It extends task-based management systems to support continuous data streams, enabling hybrid workflows with a unified programming interface for diverse data types.
Findings
Built a Distributed Stream Library integrated with COMPSs
Demonstrated support for heterogeneous data types in workflows
Enabled seamless combination of simulations and data analytics
Abstract
This paper tries to reduce the effort of learning, deploying, and integrating several frameworks for the development of e-Science applications that combine simulations with High-Performance Data Analytics (HPDA). We propose a way to extend task-based management systems to support continuous input and output data to enable the combination of task-based workflows and dataflows (Hybrid Workflows from now on) using a single programming model. Hence, developers can build complex Data Science workflows with different approaches depending on the requirements. To illustrate the capabilities of Hybrid Workflows, we have built a Distributed Stream Library and a fully functional prototype extending COMPSs, a mature, general-purpose, task-based, parallel programming model. The library can be easily integrated with existing task-based frameworks to provide support for dataflows. Also, it provides a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
