Tailoring Machine Learning for Process Mining
Paolo Ceravolo, Sylvio Barbon Junior, Ernesto Damiani, Wil, van der Aalst

TL;DR
This paper discusses the challenges of integrating machine learning into process mining, emphasizing the need for tailored data encoding and methodologies that respect process data constraints to improve model effectiveness.
Contribution
It provides an analysis of issues in applying machine learning to process data and proposes a foundation for developing aligned methodologies.
Findings
Highlights mismatch between ML assumptions and process data distributions
Emphasizes importance of data encoding in process mining
Calls for methodology grounded in process data characteristics
Abstract
Machine learning models are routinely integrated into process mining pipelines to carry out tasks like data transformation, noise reduction, anomaly detection, classification, and prediction. Often, the design of such models is based on some ad-hoc assumptions about the corresponding data distributions, which are not necessarily in accordance with the non-parametric distributions typically observed with process data. Moreover, the learning procedure they follow ignores the constraints concurrency imposes to process data. Data encoding is a key element to smooth the mismatch between these assumptions but its potential is poorly exploited. In this paper, we argue that a deeper insight into the issues raised by training machine learning models with process data is crucial to ground a sound integration of process mining and machine learning. Our analysis of such issues is aimed at laying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBusiness Process Modeling and Analysis · Data Quality and Management · Big Data and Business Intelligence
