METL: a modern ETL pipeline with a dynamic mapping matrix
Christian Haase, Timo R\"oseler, Mattias Seidel

TL;DR
This paper introduces METL, a modern ETL streaming pipeline utilizing a dynamic mapping matrix (DMM) for efficient, automated, and near real-time data transformation to a canonical data model, improving data integration in microservice architectures.
Contribution
The paper presents a novel dynamic mapping matrix (DMM) based on permutation matrices, enabling automated updates, parallel computation, and efficient compacting within ETL pipelines.
Findings
DMM allows near real-time schema updates.
METL efficiently handles data from 80+ microservices.
The approach improves data integration and transformation efficiency.
Abstract
Modern ETL streaming pipelines extract data from various sources and forward it to multiple consumers, such as data warehouses (DW) and analytical systems that leverage machine learning (ML). However, the increasing number of systems that are connected to such pipelines requires new solutions for data integration. The canonical (or common) data model (CDM) offers such an integration. It is particular useful for integrating microservice systems into ETL pipelines. (Villaca et al 2020, Oliveira et al 2019) However, a mapping to a CDM is complex. (Lemcke et al 2012) There are three complexity problems, namely the size of the required mapping matrix, the automation of updates of the matrix in response to changes in the extraction sources and the time efficiency of the mapping. In this paper, we present a new solution for these problems. More precisely, we present a new dynamic mapping…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · Caching and Content Delivery
