Turning Logs into Lumber: Preprocessing Tasks in Process Mining
Ying Liu, Vinicius Stein Dani, Iris Beerepoot, and Xixi Lu

TL;DR
This paper systematically reviews preprocessing tasks in process mining, identifying common and less frequent tasks, to improve the transparency and reliability of event log analysis.
Contribution
It provides the first comprehensive repository of preprocessing tasks and their usage in case studies, promoting more structured data preparation in process mining.
Findings
Six high-level preprocessing tasks identified
Twenty low-level preprocessing tasks identified
Log filtering, transformation, and abstraction are most common
Abstract
Event logs are invaluable for conducting process mining projects, offering insights into process improvement and data-driven decision-making. However, data quality issues affect the correctness and trustworthiness of these insights, making preprocessing tasks a necessity. Despite the recognized importance, the execution of preprocessing tasks remains ad-hoc, lacking support. This paper presents a systematic literature review that establishes a comprehensive repository of preprocessing tasks and their usage in case studies. We identify six high-level and 20 low-level preprocessing tasks in case studies. Log filtering, transformation, and abstraction are commonly used, while log enriching, integration, and reduction are less frequent. These results can be considered a first step in contributing to more structured, transparent event log preprocessing, enhancing process mining reliability.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBusiness Process Modeling and Analysis · Data Quality and Management · Big Data and Business Intelligence
