Stream DaQ: Stream-First Data Quality Monitoring
Vasileios Papastergios, Anastasios Gounaris

TL;DR
Stream DaQ introduces a novel, real-time data quality monitoring framework tailored for unbounded data streams, enabling dynamic, context-aware assessments that outperform existing static and batch-oriented approaches.
Contribution
The paper presents a new streaming data quality model with configurable windowing and adaptive constraints, implemented in an open-source Python framework that unifies fragmented quality checks.
Findings
Significantly faster execution time and higher throughput than existing solutions.
Provides richer, context-aware quality assessments through native streaming capabilities.
Seamlessly integrates with modern data science workflows.
Abstract
Data quality is fundamental to modern data science workflows, where data continuously flows as unbounded streams feeding critical downstream tasks, from elementary analytics to advanced artificial intelligence models. Existing data quality approaches either focus exclusively on static data or treat streaming as an extension of batch processing, lacking the temporal granularity and contextual awareness required for true streaming applications. In this paper, we present a novel data quality monitoring model specifically designed for unbounded data streams. Our model introduces stream-first concepts, such as configurable windowing mechanisms, dynamic constraint adaptation, and continuous assessment that produces quality meta-streams for real-time pipeline awareness. To demonstrate practical applicability, we developed Stream DaQ, an open-source Python framework that implements our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Scientific Computing and Data Management · Advanced Database Systems and Queries
