TL;DR
This paper introduces a method to improve genetic programming-based AutoML systems by using common pipeline building blocks for initialization, leading to better performance without extra costs.
Contribution
The paper identifies frequent machine learning pipeline components and uses them to initialize TPOT, enhancing its optimization efficiency and effectiveness.
Findings
Sensible initialization improves TPOT's performance on benchmarks
Identified 100 common pipeline building blocks
Initialization does not significantly harm performance on other tasks
Abstract
As data science continues to grow in popularity, there will be an increasing need to make data science tools more scalable, flexible, and accessible. In particular, automated machine learning (AutoML) systems seek to automate the process of designing and optimizing machine learning pipelines. In this chapter, we present a genetic programming-based AutoML system called TPOT that optimizes a series of feature preprocessors and machine learning models with the goal of maximizing classification accuracy on a supervised classification problem. Further, we analyze a large database of pipelines that were previously used to solve various supervised classification problems and identify 100 short series of machine learning operations that appear the most frequently, which we call the building blocks of machine learning pipelines. We harness these building blocks to initialize TPOT with promising…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
