INGESTBASE: A Declarative Data Ingestion System
Alekh Jindal, Jorge-Arnulfo Quiane-Ruiz, Samuel Madden

TL;DR
INGESTBASE is a declarative system that streamlines big data ingestion by allowing systematic planning, optimization, and fault-tolerant execution of data preprocessing directly during ingestion, improving efficiency and flexibility.
Contribution
It introduces a declarative ingestion language and optimizer for systematic, efficient, and flexible data ingestion in big data systems, integrating preprocessing into ingestion workflows.
Findings
Supports various ingestion techniques with low overhead
Achieves up to 6x performance improvement over traditional post-ingestion processing
Provides efficient, fault-tolerant distributed ingestion and data access
Abstract
Big data applications have fast arriving data that must be quickly ingested. At the same time, they have specific needs to preprocess and transform the data before it could be put to use. The current practice is to do these preparatory transformations once the data is already ingested, however, this is expensive to run and cumbersome to manage. As a result, there is a need to push data preprocessing down to the ingestion itself. In this paper, we present a declarative data ingestion system, called INGESTBASE, to allow application developers to plan and specify their data ingestion logic in a more systematic manner. We introduce the notion of ingestions plans, analogous to query plans, and present a declarative ingestion language to help developers easily build sophisticated ingestion plans. INGESTBASE provides an extensible ingestion optimizer to rewrite and optimize ingestion plans by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Advanced Database Systems and Queries · Cloud Data Security Solutions
