Process Mining for Python (PM4Py): Bridging the Gap Between Process- and Data Science
Alessandro Berti, Sebastiaan J. van Zelst, Wil van der Aalst

TL;DR
PM4Py is a new Python library that enhances process mining by integrating with popular data science tools, enabling customizable algorithms and scalable analysis of event data.
Contribution
The paper introduces PM4Py, a comprehensive Python library that bridges process mining and data science, supporting algorithm customization and large-scale experimentation.
Findings
Provides integration with pandas, numpy, scipy, scikit-learn
Offers architecture and functionality overview of PM4Py
Includes example applications demonstrating its capabilities
Abstract
Process mining, i.e., a sub-field of data science focusing on the analysis of event data generated during the execution of (business) processes, has seen a tremendous change over the past two decades. Starting off in the early 2000's, with limited to no tool support, nowadays, several software tools, i.e., both open-source, e.g., ProM and Apromore, and commercial, e.g., Disco, Celonis, ProcessGold, etc., exist. The commercial process mining tools provide limited support for implementing custom algorithms. Moreover, both commercial and open-source process mining tools are often only accessible through a graphical user interface, which hampers their usage in large-scale experimental settings. Initiatives such as RapidProM provide process mining support in the scientific workflow-based data science suite RapidMiner. However, these offer limited to no support for algorithmic customization.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBusiness Process Modeling and Analysis · Service-Oriented Architecture and Web Services · Semantic Web and Ontologies
