Indexing Execution Patterns in Workflow Provenance Graphs through Generalized Trie Structures
Esteban Garc\'ia-Cuesta (Data Science Laboratory, School of, Arquitecture, Engineering, Design, Universidad Europea de Madrid, Spain),, Jos\'e M. G\'omez-P\'erez (Expert System, Spain)

TL;DR
This paper introduces a novel generalized trie structure to index and analyze scientific workflow execution provenance, improving search and analysis capabilities over existing methods.
Contribution
It presents a statistically enriched trie-based approach to exploit workflow provenance data, bridging the gap between workflow specifications and actual execution traces.
Findings
Outperforms SPARQL 1.1 Property Paths in querying provenance graphs
Enhances workflow search and analysis through new indexing techniques
Bridges the gap between workflow descriptions and execution data
Abstract
Over the last years, scientific workflows have become mature enough to be used in a production style. However, despite the increasing maturity, there is still a shortage of tools for searching, adapting, and reusing workflows that hinders a more generalized adoption by the scientific communities. Indeed, due to the limited availability of machine-readable scientific metadata and the heterogeneity of workflow specification formats and representations, new ways to leverage alternative sources of information that complement existing approaches are needed. In this paper we address such limitations by applying statistically enriched generalized trie structures to exploit workflow execution provenance information in order to assist the analysis, indexing and search of scientific workflows. Our method bridges the gap between the description of what a workflow is supposed to do according to its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Research Data Management Practices · Distributed and Parallel Computing Systems
