Data Cleaning of Data Streams
Valerie Restat, Niklas Rodenhausen, Carina Antonin, Uta St\"orl

TL;DR
This paper explores the unique challenges of cleaning streaming data, analyzing its applicability, limitations, and requirements through theoretical and experimental approaches, highlighting inconsistencies in current methods.
Contribution
It provides a detailed analysis of data cleaning for data streams, evaluates theoretical considerations with experiments, and investigates streaming technology requirements.
Findings
Cleaning is inconsistent when applied to data streams.
Theoretical considerations are validated through experiments.
Streaming technology requirements are identified for effective data cleaning.
Abstract
Streaming data can arise from a variety of contexts. Important use cases are continuous sensor measurements such as temperature, light or radiation values. In the process, streaming data may also contain data errors that should be cleaned before further use. Many studies from science and practice focus on data cleaning in a static context. However, in terms of data cleaning, streaming data has particularities that distinguish it from static data. In this paper, we have therefore undertaken an intensive exploration of data cleaning of data streams. We provide a detailed analysis of the applicability of data cleaning to data streams. Our theoretical considerations are evaluated in comprehensive experiments. Using a prototype framework, we show that cleaning is not consistent when working with data streams. An additional contribution is the investigation of requirements for streaming…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
