Towards Interactively Improving ML Data Preparation Code via "Shadow   Pipelines"

Stefan Grafberger; Paul Groth; Sebastian Schelter

arXiv:2404.19591·cs.DB·May 1, 2024

Towards Interactively Improving ML Data Preparation Code via "Shadow Pipelines"

Stefan Grafberger, Paul Groth, Sebastian Schelter

PDF

1 Repo

TL;DR

This paper proposes using shadow pipelines to automatically generate interactive suggestions for improving ML data preparation code, aiming to assist data scientists in debugging and refining their pipelines more efficiently.

Contribution

It introduces the concept of shadow pipelines for auto-detecting issues and suggesting improvements, with an emphasis on low-latency computation through incremental view maintenance.

Findings

01

Preliminary experiments show feasibility of shadow pipelines.

02

Potential for reducing manual debugging effort.

03

Optimizations enable low-latency updates.

Abstract

Data scientists develop ML pipelines in an iterative manner: they repeatedly screen a pipeline for potential issues, debug it, and then revise and improve its code according to their findings. However, this manual process is tedious and error-prone. Therefore, we propose to support data scientists during this development cycle with automatically derived interactive suggestions for pipeline improvements. We discuss our vision to generate these suggestions with so-called shadow pipelines, hidden variants of the original pipeline that modify it to auto-detect potential issues, try out modifications for improvements, and suggest and explain these modifications to the user. We envision to apply incremental view maintenance-based optimisations to ensure low-latency computation and maintenance of the shadow pipelines. We conduct preliminary experiments to showcase the feasibility of our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

stefan-grafberger/shadow-pipeline-experiments
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.