Trace Encoding in Process Mining: a survey and benchmarking
Sylvio Barbon Jr., Paolo Ceravolo, Rafael S. Oyamada, Gabriel M., Tavares

TL;DR
This paper provides a comprehensive survey and benchmarking of 27 trace encoding methods in process mining, highlighting their expressivity, scalability, and domain independence to improve future research and application.
Contribution
It offers the most extensive comparison of encoding methods in process mining, addressing current practices and guiding better method selection and hyperparameter tuning.
Findings
Most encoding methods are used arbitrarily without thorough evaluation.
Default hyperparameters often lead to suboptimal performance.
The study highlights the importance of encoding choice in process mining pipelines.
Abstract
Encoding methods are employed across several process mining tasks, including predictive process monitoring, anomalous case detection, trace clustering, etc. These methods are usually performed as preprocessing steps and are responsible for transforming complex information into a numerical feature space. Most papers choose existing encoding methods arbitrarily or employ a strategy based on a specific expert knowledge domain. Moreover, existing methods are employed by using their default hyperparameters without evaluating other options. This practice can lead to several drawbacks, such as suboptimal performance and unfair comparisons with the state-of-the-art. Therefore, this work aims at providing a comprehensive survey on event log encoding by comparing 27 methods, from different natures, in terms of expressivity, scalability, correlation, and domain agnosticism. To the best of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBusiness Process Modeling and Analysis · Data Quality and Management · Big Data and Business Intelligence
