Progressive Data Science: Potential and Challenges
Cagatay Turkay, Nicola Pezzotti, Carsten Binnig, Hendrik Strobelt,, Barbara Hammer, Daniel A. Keim, Jean-Daniel Fekete, Themis Palpanas, Yunhai, Wang, Florin Rusu

TL;DR
Progressive data science aims to accelerate the iterative data analysis process by providing quick, approximate results at each step, enabling early detection of issues and more efficient refinement.
Contribution
This paper discusses the challenges and potential solutions for implementing progressive computation in various stages of the data science pipeline.
Findings
Progressive approximations can speed up data science workflows.
Early feedback helps detect errors and guide modifications.
Challenges include computing accurate approximations early in the pipeline.
Abstract
Data science requires time-consuming iterative manual activities. In particular, activities such as data selection, preprocessing, transformation, and mining, highly depend on iterative trial-and-error processes that could be sped-up significantly by providing quick feedback on the impact of changes. The idea of progressive data science is to compute the results of changes in a progressive manner, returning a first approximation of results quickly and allow iterative refinements until converging to a final result. Enabling the user to interact with the intermediate results allows an early detection of erroneous or suboptimal choices, the guided definition of modifications to the pipeline and their quick assessment. In this paper, we discuss the progressiveness challenges arising in different steps of the data science pipeline. We describe how changes in each step of the pipeline impact…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Data Visualization and Analytics · Anomaly Detection Techniques and Applications
