Model-driven Stochastic Trace Clustering
Jari Peeperkorn, Johannes De Smedt, Jochen De Weerdt

TL;DR
This paper introduces a stochastic, model-driven trace clustering method that enhances process model interpretability and captures real-world dynamics by optimizing cluster-specific stochastic process models using entropic relevance.
Contribution
The paper presents a novel stochastic process model-based trace clustering approach that improves interpretability and captures execution variability, outperforming existing methods in stochastic coherence.
Findings
Yields superior stochastic coherence and graph simplicity.
Scales linearly with input size, ensuring computational efficiency.
Highlights a trade-off between stochastic coherence and traditional fitness metrics.
Abstract
Process discovery algorithms automatically extract process models from event logs, but high variability often results in complex and hard-to-understand models. To mitigate this issue, trace clustering techniques group process executions into clusters, each represented by a simpler and more understandable process model. Model-driven trace clustering improves on this by assigning traces to clusters based on their conformity to cluster-specific process models. However, most existing clustering techniques rely on either no process model discovery, or non-stochastic models, neglecting the frequency or probability of activities and transitions, thereby limiting their capability to capture real-world execution dynamics. We propose a novel model-driven trace clustering method that optimizes stochastic process models within each cluster. Our approach uses entropic relevance, a stochastic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBusiness Process Modeling and Analysis · Software System Performance and Reliability · Data Visualization and Analytics
