Which is the best model for my data?

Gonzalo N\'apoles; Isel Grau; \c{C}i\c{c}ek G\"uven and; Or\c{c}un \"Ozdemir; Yamisleydi Salgueiro

arXiv:2210.14687·cs.LG·October 27, 2022

Which is the best model for my data?

Gonzalo N\'apoles, Isel Grau, \c{C}i\c{c}ek G\"uven and, Or\c{c}un \"Ozdemir, Yamisleydi Salgueiro

PDF

Open Access

TL;DR

This paper presents a meta-learning approach that accurately predicts the best classification model and hyperparameters for a dataset, using a novel set of meta-features and synthetic data augmentation.

Contribution

It introduces a unified meta-learning framework for simultaneous model selection and hyperparameter tuning, utilizing new meta-features and synthetic data generation.

Findings

01

Achieves 91% accuracy on synthetic datasets

02

Achieves 87% accuracy on real-world datasets

03

Meta-features improve classifier performance

Abstract

In this paper, we tackle the problem of selecting the optimal model for a given structured pattern classification dataset. In this context, a model can be understood as a classifier and a hyperparameter configuration. The proposed meta-learning approach purely relies on machine learning and involves four major steps. Firstly, we present a concise collection of 62 meta-features that address the problem of information cancellation when aggregation measure values involving positive and negative measurements. Secondly, we describe two different approaches for synthetic data generation intending to enlarge the training data. Thirdly, we fit a set of pre-defined classification models for each classification problem while optimizing their hyperparameters using grid search. The goal is to create a meta-dataset such that each row denotes a multilabel instance describing a specific problem. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification