Mathematical Foundations of Modeling ETL Process Chains
Levin Maier, Lucas Schulze, Robert Lilow, Lukas Hahn, Nikola Krasowski, Arnulf Barth, Sebastian Gaebel, Ferdi G\"uran, Oliver Hanau, Giovanni Wagner, Falk Borgmann, Oleg Arenz, Jan Peters

TL;DR
This paper introduces a mathematical model for ETL process chains that accurately predicts throughput and supports efficient simulation, considering resource allocation, stochastic processing times, and bottlenecks.
Contribution
It develops a novel Markov process-based framework for modeling ETL chains, capturing resource effects, stochastic variability, and bottleneck impacts for better simulation and control.
Findings
Model accurately predicts throughput at the chain level.
Flow Balance postulate links threads, throughput, and processing time.
Framework enables efficient simulation and resource optimization.
Abstract
Extract-Transform-Load (ETL) processes are core components of modern data processing infrastructures. The throughput of processed data records can be adjusted by changing the amount of allocated resources, i.e.~the number of parallel processing threads for each of the three ETL phases, but also depends on stochastic variations in the per-record processing times. In chains of multiple consecutive ETL processes, the relation between allocated resources and overall throughput is further complicated, for example by the occurrence of bottlenecks affecting all subsequent ETL processes. We develop a mathematical model of ETL process chains that is accurate at the level of time-aggregated throughput and suitable for efficient simulation. The process chain is represented as a controlled discrete-time Markov process on a directed acyclic graph whose edges are individual ETL processes. We model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
