SubStrat: A Subset-Based Strategy for Faster AutoML
Teddy Lazebnik, Amit Somech, Abraham Itzhak Weinberg

TL;DR
SubStrat is a novel AutoML optimization strategy that reduces runtime by selecting representative data subsets using a genetic algorithm, maintaining high accuracy while significantly decreasing execution time.
Contribution
It introduces a data subset selection approach for AutoML that improves efficiency without sacrificing much accuracy, wrapping existing AutoML tools.
Findings
Reduces AutoML runtime by 79% on average.
Maintains less than 2% accuracy loss.
Effective on Auto-Sklearn and TPOT.
Abstract
Automated machine learning (AutoML) frameworks have become important tools in the data scientists' arsenal, as they dramatically reduce the manual work devoted to the construction of ML pipelines. Such frameworks intelligently search among millions of possible ML pipelines - typically containing feature engineering, model selection and hyper parameters tuning steps - and finally output an optimal pipeline in terms of predictive accuracy. However, when the dataset is large, each individual configuration takes longer to execute, therefore the overall AutoML running times become increasingly high. To this end, we present SubStrat, an AutoML optimization strategy that tackles the data size, rather than configuration space. It wraps existing AutoML tools, and instead of executing them directly on the entire dataset, SubStrat uses a genetic-based algorithm to find a small yet representative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Data Stream Mining Techniques · Metaheuristic Optimization Algorithms Research
