or2yw: Modeling and Visualizing OpenRefineHistories as YesWorkflow Diagrams
Nikolaus Nova Parulian, Lan Li, Bertram Ludaescher

TL;DR
This paper introduces or2yw, a tool that converts OpenRefine data cleaning histories into YesWorkflow diagrams, enabling visualization, querying, and better understanding of workflows for transparency and reuse.
Contribution
The paper presents a novel method to automatically generate YesWorkflow models from OpenRefine operation histories, including linear and parallel workflow visualizations.
Findings
Generated models improve workflow transparency.
Parallel models reveal independent data cleaning steps.
Tool facilitates workflow reuse and documentation.
Abstract
OpenRefine is a popular open-source data cleaning tool. It allows users to export a previously executed data cleaning workflow in a JSON format for possible reuse on other datasets. We have developed or2yw, a novel tool that maps a JSON-formatted OpenRefine operation history to a YesWorkflow (YW) model, which then can be visualized and queried using the YW tool. The latter was originally developed to allow researchers a simple way to annotate their program scripts in order to reveal the workflow steps and dataflow dependencies implicit in those scripts. With or2yw the user can automatically generate YW models from OpenRefine operation histories, thus providing a 'workflow view' on a previously executed sequence of data cleaning operations. The or2yw tool can generate different types of YesWorkflow models, e.g., a linear model which mirrors the sequential execution order of operations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
