Automatically Debugging AutoML Pipelines using Maro: ML Automated Remediation Oracle (Extended Version)
Julian Dolby, Jason Tsay, Martin Hirzel

TL;DR
This paper introduces Maro, an automated tool that diagnoses and fixes failures in ML pipelines by combining AutoML and SMT techniques, improving robustness without sacrificing performance.
Contribution
Maro is a novel system that automatically explains and repairs ML pipeline failures, integrating seamlessly with existing data science tools.
Findings
Most errors are fixed with a single automated remediation.
Remediation does not significantly affect accuracy or convergence time.
Maro works effectively with popular ML ecosystems.
Abstract
Machine learning in practice often involves complex pipelines for data cleansing, feature engineering, preprocessing, and prediction. These pipelines are composed of operators, which have to be correctly connected and whose hyperparameters must be correctly configured. Unfortunately, it is quite common for certain combinations of datasets, operators, or hyperparameters to cause failures. Diagnosing and fixing those failures is tedious and error-prone and can seriously derail a data scientist's workflow. This paper describes an approach for automatically debugging an ML pipeline, explaining the failures, and producing a remediation. We implemented our approach, which builds on a combination of AutoML and SMT, in a tool called Maro. Maro works seamlessly with the familiar data science ecosystem including Python, Jupyter notebooks, scikit-learn, and AutoML tools such as Hyperopt. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
