TL;DR
This paper proposes using shadow pipelines to automatically generate interactive suggestions for improving ML data preparation code, aiming to assist data scientists in debugging and refining their pipelines more efficiently.
Contribution
It introduces the concept of shadow pipelines for auto-detecting issues and suggesting improvements, with an emphasis on low-latency computation through incremental view maintenance.
Findings
Preliminary experiments show feasibility of shadow pipelines.
Potential for reducing manual debugging effort.
Optimizations enable low-latency updates.
Abstract
Data scientists develop ML pipelines in an iterative manner: they repeatedly screen a pipeline for potential issues, debug it, and then revise and improve its code according to their findings. However, this manual process is tedious and error-prone. Therefore, we propose to support data scientists during this development cycle with automatically derived interactive suggestions for pipeline improvements. We discuss our vision to generate these suggestions with so-called shadow pipelines, hidden variants of the original pipeline that modify it to auto-detect potential issues, try out modifications for improvements, and suggest and explain these modifications to the user. We envision to apply incremental view maintenance-based optimisations to ensure low-latency computation and maintenance of the shadow pipelines. We conduct preliminary experiments to showcase the feasibility of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
