SubStrat: A Subset-Based Strategy for Faster AutoML

Teddy Lazebnik; Amit Somech; Abraham Itzhak Weinberg

arXiv:2206.03070·cs.LG·December 31, 2024·1 cites

SubStrat: A Subset-Based Strategy for Faster AutoML

Teddy Lazebnik, Amit Somech, Abraham Itzhak Weinberg

PDF

Open Access 1 Repo

TL;DR

SubStrat is a novel AutoML optimization strategy that reduces runtime by selecting representative data subsets using a genetic algorithm, maintaining high accuracy while significantly decreasing execution time.

Contribution

It introduces a data subset selection approach for AutoML that improves efficiency without sacrificing much accuracy, wrapping existing AutoML tools.

Findings

01

Reduces AutoML runtime by 79% on average.

02

Maintains less than 2% accuracy loss.

03

Effective on Auto-Sklearn and TPOT.

Abstract

Automated machine learning (AutoML) frameworks have become important tools in the data scientists' arsenal, as they dramatically reduce the manual work devoted to the construction of ML pipelines. Such frameworks intelligently search among millions of possible ML pipelines - typically containing feature engineering, model selection and hyper parameters tuning steps - and finally output an optimal pipeline in terms of predictive accuracy. However, when the dataset is large, each individual configuration takes longer to execute, therefore the overall AutoML running times become increasingly high. To this end, we present SubStrat, an AutoML optimization strategy that tackles the data size, rather than configuration space. It wraps existing AutoML tools, and instead of executing them directly on the entire dataset, SubStrat uses a genetic-based algorithm to find a small yet representative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

teddy4445/substrat
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Data Stream Mining Techniques · Metaheuristic Optimization Algorithms Research