Identifying and Harnessing the Building Blocks of Machine Learning   Pipelines for Sensible Initialization of a Data Science Automation Tool

Randal S. Olson; Jason H. Moore

arXiv:1607.08878·cs.NE·August 1, 2016

Identifying and Harnessing the Building Blocks of Machine Learning Pipelines for Sensible Initialization of a Data Science Automation Tool

Randal S. Olson, Jason H. Moore

PDF

1 Repo

TL;DR

This paper introduces a method to improve genetic programming-based AutoML systems by using common pipeline building blocks for initialization, leading to better performance without extra costs.

Contribution

The paper identifies frequent machine learning pipeline components and uses them to initialize TPOT, enhancing its optimization efficiency and effectiveness.

Findings

01

Sensible initialization improves TPOT's performance on benchmarks

02

Identified 100 common pipeline building blocks

03

Initialization does not significantly harm performance on other tasks

Abstract

As data science continues to grow in popularity, there will be an increasing need to make data science tools more scalable, flexible, and accessible. In particular, automated machine learning (AutoML) systems seek to automate the process of designing and optimizing machine learning pipelines. In this chapter, we present a genetic programming-based AutoML system called TPOT that optimizes a series of feature preprocessors and machine learning models with the goal of maximizing classification accuracy on a supervised classification problem. Further, we analyze a large database of pipelines that were previously used to solve various supervised classification problems and identify 100 short series of machine learning operations that appear the most frequently, which we call the building blocks of machine learning pipelines. We harness these building blocks to initialize TPOT with promising…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rhiever/tpot
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.