Applying Process Mining on Scientific Workflows: a Case Study on High   Performance Computing Data

Zahra Sadeghibogar; Alessandro Berti; Marco Pegoraro; Wil M.P. van der; Aalst

arXiv:2307.02833·cs.DB·February 17, 2025·1 cites

Applying Process Mining on Scientific Workflows: a Case Study on High Performance Computing Data

Zahra Sadeghibogar, Alessandro Berti, Marco Pegoraro, Wil M.P. van der, Aalst

PDF

Open Access

TL;DR

This paper demonstrates how process mining techniques can be applied to scientific workflows in HPC environments by extracting and correlating job logs from SLURM systems to analyze complex data and control flows.

Contribution

It introduces novel methods for extracting and correlating HPC job logs, enabling process mining on scientific workflows with or without explicit job dependencies.

Findings

01

Effective log extraction and correlation methods developed

02

Validated approach enables workflow documentation

03

Identifies performance bottlenecks in HPC workflows

Abstract

Computer-based scientific experiments are becoming increasingly data-intensive, necessitating the use of High-Performance Computing (HPC) clusters to handle large scientific workflows. These workflows result in complex data and control flows within the system, making analysis challenging. This paper focuses on the extraction of case IDs from SLURM-based HPC cluster logs, a crucial step for applying mainstream process mining techniques. The core contribution is the development of methods to correlate jobs in the system, whether their interdependencies are explicitly specified or not. We present our log extraction and correlation techniques, supported by experiments that validate our approach, enabling comprehensive documentation of workflows and identification of performance bottlenecks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBusiness Process Modeling and Analysis · Big Data and Business Intelligence · Service-Oriented Architecture and Web Services