Accelerating Fresh Data Exploration with Fluid ETL Pipelines

Maxwell Norfolk; Dong Xie

arXiv:2603.22220·cs.DB·March 24, 2026

Accelerating Fresh Data Exploration with Fluid ETL Pipelines

Maxwell Norfolk, Dong Xie

PDF

Open Access

TL;DR

This paper introduces fluid ETL pipelines that enable flexible, on-demand data preprocessing to accelerate fresh data exploration, leveraging idle resources and adapting to evolving user interests.

Contribution

The paper proposes fluid ETL pipelines that allow dynamic, resource-efficient data preprocessing, improving fresh data exploration without blocking ingestion or requiring extensive prior knowledge.

Findings

01

Viability demonstrated on real-world dataset

02

Fluid pipelines enable on-demand DPR execution

03

Adaptive DPR management improves exploration efficiency

Abstract

Recently, we have seen an increasing need for fresh data exploration, where data analysts seek to explore the main characteristics or detect anomalies of data being actively collected. In addition to the common challenges in classic data exploration, such as a lack of prior knowledge about the data or the analysis goal, fresh data exploration also demands an ingestion system with sufficient throughput to keep up with rapid data accumulation. However, leveraging traditional Extract-Transform-Load (ETL) pipelines to achieve low query latency can still be extremely resource-intensive as they must conduct an excessive amount of data preprocessing routines (DPRs) (e.g., parsing and indexing) to cover unpredictable data characteristics and analysis goals. To overcome this challenge, we seek to approach it from a different angle: leveraging occasional idle system capacity or cheap preemptive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Data Management and Algorithms · Advanced Database Systems and Queries