Relational Algebra for In-Database Process Mining
Remco Dijkman, Juntao Gao, Paul Grefen, Arthur ter Hofstede

TL;DR
This paper introduces a formal relational algebra operator for directly extracting process relations from operational databases, enabling more flexible and efficient in-database process mining without intermediate flat files.
Contribution
It formally defines a 'directly follows' operator in relational algebra, facilitating in-database process mining and query optimization.
Findings
Defines a 'directly follows' operator using relational algebra.
Proves equivalence properties for query optimization.
Analyzes time-complexity of the operator.
Abstract
The execution logs that are used for process mining in practice are often obtained by querying an operational database and storing the result in a flat file. Consequently, the data processing power of the database system cannot be used anymore for this information, leading to constrained flexibility in the definition of mining patterns and limited execution performance in mining large logs. Enabling process mining directly on a database - instead of via intermediate storage in a flat file - therefore provides additional flexibility and efficiency. To help facilitate this ideal of in-database process mining, this paper formally defines a database operator that extracts the 'directly follows' relation from an operational database. This operator can both be used to do in-database process mining and to flexibly evaluate process mining related queries, such as: "which employee most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBusiness Process Modeling and Analysis · Advanced Database Systems and Queries · Semantic Web and Ontologies
