Towards Evolution Capabilities in Data Pipelines
Kevin M. Kramer

TL;DR
This paper emphasizes the importance of incorporating evolution capabilities into data pipeline frameworks to handle structural and semantic changes over time, proposing a conceptual model for self-awareness and self-adaptation.
Contribution
It introduces a requirements model for evolution capabilities in data pipelines, addressing a major gap in existing frameworks.
Findings
Identifies the need for evolution capabilities in data pipelines.
Provides a conceptual requirements model for self-awareness and self-adaptation.
Lays the foundation for a framework to manage evolutionary change.
Abstract
Evolutionary change over time in the context of data pipelines is certain, especially with regard to the structure and semantics of data as well as to the pipeline operators. Dealing with these changes, i.e. providing long-term maintenance, is costly. The present work explores the need for evolution capabilities within pipeline frameworks. In this context dealing with evolution is defined as a two-step process consisting of self-awareness and self-adaption. Furthermore, a conceptual requirements model is provided, which encompasses criteria for self-awareness and self-adaption as well as covering the dimensions data, operator, pipeline and environment. A lack of said capabilities in existing frameworks exposes a major gap. Filling this gap will be a significant contribution for practitioners and scientists alike. The present work envisions and lays the foundation for a framework which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Semantic Web and Ontologies · Data Visualization and Analytics
