YesWorkflow: A User-Oriented, Language-Independent Tool for Recovering Workflow Information from Scripts
Timothy McPhillips, Tianhong Song, Tyler Kolisnik, Steve Aulenbach,, Khalid Belhajjame, Kyle Bocinsky, Yang Cao, Fernando Chirigati, Saumen Dey,, Juliana Freire, Deborah Huntzinger, Christopher Jones, David Koop, Paolo, Missier, Mark Schildhauer, Christopher Schwalm, Yaxing Wei

TL;DR
YesWorkflow is a user-friendly, language-agnostic tool that enables scientists to annotate and visualize the workflow structure within scripts, bridging the gap between scripting flexibility and workflow management benefits.
Contribution
It introduces a novel annotation-based approach that extracts workflow information from scripts without requiring workflow engine integration.
Findings
Enables visualization of script workflows through annotations
Supports multiple scripting languages without modification
Facilitates provenance querying for scientific data products
Abstract
Scientific workflow management systems offer features for composing complex computational pipelines from modular building blocks, for executing the resulting automated workflows, and for recording the provenance of data products resulting from workflow runs. Despite the advantages such features provide, many automated workflows continue to be implemented and executed outside of scientific workflow systems due to the convenience and familiarity of scripting languages (such as Perl, Python, R, and MATLAB), and to the high productivity many scientists experience when using these languages. YesWorkflow is a set of software tools that aim to provide such users of scripting languages with many of the benefits of scientific workflow systems. YesWorkflow requires neither the use of a workflow engine nor the overhead of adapting code to run effectively in such a system. Instead, YesWorkflow…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Research Data Management Practices · Distributed and Parallel Computing Systems
