Characterizing classification datasets: a study of meta-features for meta-learning
Adriano Rivolli, Lu\'is P. F. Garcia, Carlos Soares, Joaquin, Vanschoren, Andr\'e C. P. L. F. de Carvalho

TL;DR
This paper standardizes and systematizes meta-features for classification datasets in meta-learning, introduces a new extraction tool, and promotes reproducibility in empirical research.
Contribution
It provides a standardized framework for dataset meta-features, introduces MFE tool, and offers guidelines to improve reproducibility in meta-learning studies.
Findings
Developed a comprehensive set of standardized meta-features
Created MFE, a tool for extracting meta-features from datasets
Identified reproducibility issues in existing meta-learning research
Abstract
Meta-learning is increasingly used to support the recommendation of machine learning algorithms and their configurations. Such recommendations are made based on meta-data, consisting of performance evaluations of algorithms on prior datasets, as well as characterizations of these datasets. These characterizations, also called meta-features, describe properties of the data which are predictive for the performance of machine learning algorithms trained on them. Unfortunately, despite being used in a large number of studies, meta-features are not uniformly described, organized and computed, making many empirical studies irreproducible and hard to compare. This paper aims to deal with this by systematizing and standardizing data characterization measures for classification datasets used in meta-learning. Moreover, it presents MFE, a new tool for extracting meta-features from datasets and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Data Stream Mining Techniques · Imbalanced Data Classification Techniques
