FlowETL: An Autonomous Example-Driven Pipeline for Data Engineering
Mattia Di Profio, Mingjun Zhong, Yaji Sripada, and Marcel Jaspars

TL;DR
FlowETL is an autonomous, example-driven ETL pipeline that automatically designs and applies data transformations to standardize datasets, reducing human intervention and improving generalization across diverse data sources.
Contribution
The paper introduces FlowETL, a novel autonomous ETL system that automatically generates transformation plans from input-output examples, advancing automation in data engineering.
Findings
Demonstrates successful generalization across 14 diverse datasets.
Achieves automated data transformation with minimal human input.
Provides an observable and adaptable ETL pipeline architecture.
Abstract
The Extract, Transform, Load (ETL) workflow is fundamental for populating and maintaining data warehouses and other data stores accessed by analysts for downstream tasks. A major shortcoming of modern ETL solutions is the extensive need for a human-in-the-loop, required to design and implement context-specific, and often non-generalisable transformations. While related work in the field of ETL automation shows promising progress, there is a lack of solutions capable of automatically designing and applying these transformations. We present FlowETL, a novel example-based autonomous ETL pipeline architecture designed to automatically standardise and prepare input datasets according to a concise, user-defined target dataset. FlowETL is an ecosystem of components which interact together to achieve the desired outcome. A Planning Engine uses a paired input-output datasets sample to construct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Business Process Modeling and Analysis · Semantic Web and Ontologies
