Applying Process Mining on Scientific Workflows: a Case Study on High Performance Computing Data
Zahra Sadeghibogar, Alessandro Berti, Marco Pegoraro, Wil M.P. van der, Aalst

TL;DR
This paper demonstrates how process mining techniques can be applied to scientific workflows in HPC environments by extracting and correlating job logs from SLURM systems to analyze complex data and control flows.
Contribution
It introduces novel methods for extracting and correlating HPC job logs, enabling process mining on scientific workflows with or without explicit job dependencies.
Findings
Effective log extraction and correlation methods developed
Validated approach enables workflow documentation
Identifies performance bottlenecks in HPC workflows
Abstract
Computer-based scientific experiments are becoming increasingly data-intensive, necessitating the use of High-Performance Computing (HPC) clusters to handle large scientific workflows. These workflows result in complex data and control flows within the system, making analysis challenging. This paper focuses on the extraction of case IDs from SLURM-based HPC cluster logs, a crucial step for applying mainstream process mining techniques. The core contribution is the development of methods to correlate jobs in the system, whether their interdependencies are explicitly specified or not. We present our log extraction and correlation techniques, supported by experiments that validate our approach, enabling comprehensive documentation of workflows and identification of performance bottlenecks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBusiness Process Modeling and Analysis · Big Data and Business Intelligence · Service-Oriented Architecture and Web Services
