Assembled-OpenML: Creating Efficient Benchmarks for Ensembles in AutoML   with OpenML

Lennart Purucker; Joeran Beel

arXiv:2307.00285·cs.LG·July 4, 2023·1 cites

Assembled-OpenML: Creating Efficient Benchmarks for Ensembles in AutoML with OpenML

Lennart Purucker, Joeran Beel

PDF

Open Access 1 Repo

TL;DR

This paper introduces Assembled-OpenML, a Python tool that creates meta-datasets from OpenML to enable efficient comparison of ensemble techniques in AutoML by using stored predictions instead of retraining models.

Contribution

The paper presents a novel Python tool, Assembled-OpenML, which simplifies and accelerates the comparison of ensemble methods in AutoML through meta-datasets derived from OpenML data.

Findings

01

Assembled-OpenML reduces comparison time for ensemble techniques.

02

The tool efficiently creates meta-datasets from OpenML predictions.

03

Comparison of 1523 models on 31 datasets took about 1 hour.

Abstract

Automated Machine Learning (AutoML) frameworks regularly use ensembles. Developers need to compare different ensemble techniques to select appropriate techniques for an AutoML framework from the many potential techniques. So far, the comparison of ensemble techniques is often computationally expensive, because many base models must be trained and evaluated one or multiple times. Therefore, we present Assembled-OpenML. Assembled-OpenML is a Python tool, which builds meta-datasets for ensembles using OpenML. A meta-dataset, called Metatask, consists of the data of an OpenML task, the task's dataset, and prediction data from model evaluations for the task. We can make the comparison of ensemble techniques computationally cheaper by using the predictions stored in a metatask instead of training and evaluating base models. To introduce Assembled-OpenML, we describe the first version of our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

isg-siegen/assembled
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Software Engineering Research

MethodsBalanced Selection