Towards Evolution Capabilities in Data Pipelines

Kevin M. Kramer

arXiv:2308.14591·cs.DB·July 29, 2025

Towards Evolution Capabilities in Data Pipelines

Kevin M. Kramer

PDF

Open Access

TL;DR

This paper emphasizes the importance of incorporating evolution capabilities into data pipeline frameworks to handle structural and semantic changes over time, proposing a conceptual model for self-awareness and self-adaptation.

Contribution

It introduces a requirements model for evolution capabilities in data pipelines, addressing a major gap in existing frameworks.

Findings

01

Identifies the need for evolution capabilities in data pipelines.

02

Provides a conceptual requirements model for self-awareness and self-adaptation.

03

Lays the foundation for a framework to manage evolutionary change.

Abstract

Evolutionary change over time in the context of data pipelines is certain, especially with regard to the structure and semantics of data as well as to the pipeline operators. Dealing with these changes, i.e. providing long-term maintenance, is costly. The present work explores the need for evolution capabilities within pipeline frameworks. In this context dealing with evolution is defined as a two-step process consisting of self-awareness and self-adaption. Furthermore, a conceptual requirements model is provided, which encompasses criteria for self-awareness and self-adaption as well as covering the dimensions data, operator, pipeline and environment. A lack of said capabilities in existing frameworks exposes a major gap. Filling this gap will be a significant contribution for practitioners and scientists alike. The present work envisions and lays the foundation for a framework which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Database Systems and Queries · Semantic Web and Ontologies · Data Visualization and Analytics