DiffML: End-to-end Differentiable ML Pipelines
Benjamin Hilprecht, Christian Hammacher, Eduardo Reis, Mohamed, Abdelaal, Carsten Binnig

TL;DR
DiffML introduces a novel approach to constructing end-to-end differentiable machine learning pipelines, enabling joint training of data preprocessing and modeling steps through backpropagation, which opens new research avenues.
Contribution
This paper proposes the concept of differentiable ML pipelines, demonstrating how typical preprocessing steps can be formulated as differentiable programs for joint optimization.
Findings
Initial ideas for differentiable data cleaning and feature selection
Feasibility of joint training of preprocessing and models
Discussion of research challenges and future directions
Abstract
In this paper, we present our vision of differentiable ML pipelines called DiffML to automate the construction of ML pipelines in an end-to-end fashion. The idea is that DiffML allows to jointly train not just the ML model itself but also the entire pipeline including data preprocessing steps, e.g., data cleaning, feature selection, etc. Our core idea is to formulate all pipeline steps in a differentiable way such that the entire pipeline can be trained using backpropagation. However, this is a non-trivial problem and opens up many new research questions. To show the feasibility of this direction, we demonstrate initial ideas and a general principle of how typical preprocessing steps such as data cleaning, feature selection and dataset selection can be formulated as differentiable programs and jointly learned with the ML model. Moreover, we discuss a research roadmap and core challenges…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Topic Modeling
MethodsFeature Selection
