Assembled-OpenML: Creating Efficient Benchmarks for Ensembles in AutoML with OpenML
Lennart Purucker, Joeran Beel

TL;DR
This paper introduces Assembled-OpenML, a Python tool that creates meta-datasets from OpenML to enable efficient comparison of ensemble techniques in AutoML by using stored predictions instead of retraining models.
Contribution
The paper presents a novel Python tool, Assembled-OpenML, which simplifies and accelerates the comparison of ensemble methods in AutoML through meta-datasets derived from OpenML data.
Findings
Assembled-OpenML reduces comparison time for ensemble techniques.
The tool efficiently creates meta-datasets from OpenML predictions.
Comparison of 1523 models on 31 datasets took about 1 hour.
Abstract
Automated Machine Learning (AutoML) frameworks regularly use ensembles. Developers need to compare different ensemble techniques to select appropriate techniques for an AutoML framework from the many potential techniques. So far, the comparison of ensemble techniques is often computationally expensive, because many base models must be trained and evaluated one or multiple times. Therefore, we present Assembled-OpenML. Assembled-OpenML is a Python tool, which builds meta-datasets for ensembles using OpenML. A meta-dataset, called Metatask, consists of the data of an OpenML task, the task's dataset, and prediction data from model evaluations for the task. We can make the comparison of ensemble techniques computationally cheaper by using the predictions stored in a metatask instead of training and evaluating base models. To introduce Assembled-OpenML, we describe the first version of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Software Engineering Research
MethodsBalanced Selection
