Which is the best model for my data?
Gonzalo N\'apoles, Isel Grau, \c{C}i\c{c}ek G\"uven and, Or\c{c}un \"Ozdemir, Yamisleydi Salgueiro

TL;DR
This paper presents a meta-learning approach that accurately predicts the best classification model and hyperparameters for a dataset, using a novel set of meta-features and synthetic data augmentation.
Contribution
It introduces a unified meta-learning framework for simultaneous model selection and hyperparameter tuning, utilizing new meta-features and synthetic data generation.
Findings
Achieves 91% accuracy on synthetic datasets
Achieves 87% accuracy on real-world datasets
Meta-features improve classifier performance
Abstract
In this paper, we tackle the problem of selecting the optimal model for a given structured pattern classification dataset. In this context, a model can be understood as a classifier and a hyperparameter configuration. The proposed meta-learning approach purely relies on machine learning and involves four major steps. Firstly, we present a concise collection of 62 meta-features that address the problem of information cancellation when aggregation measure values involving positive and negative measurements. Secondly, we describe two different approaches for synthetic data generation intending to enlarge the training data. Thirdly, we fit a set of pre-defined classification models for each classification problem while optimizing their hyperparameters using grid search. The goal is to create a meta-dataset such that each row denotes a multilabel instance describing a specific problem. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification
