# Dataset2Vec: Learning Dataset Meta-Features

**Authors:** Hadi S. Jomaa, Lars Schmidt-Thieme, Josif Grabocka

arXiv: 1905.11063 · 2021-01-12

## TL;DR

This paper introduces Dataset2Vec, a deep learning-based method for extracting meta-features from datasets to improve meta-learning, demonstrating superior performance over traditional engineered features across diverse datasets.

## Contribution

The paper proposes Dataset2Vec, a novel deep neural network approach for learning dataset meta-features, and introduces a dataset similarity task to enhance meta-learning capabilities.

## Key findings

- Dataset2Vec outperforms engineered meta-features in hyperparameter optimization tasks.
- Meta-features learned by Dataset2Vec generalize across datasets with varying schemas.
- The approach enables more effective meta-learning for diverse datasets.

## Abstract

Meta-learning, or learning to learn, is a machine learning approach that utilizes prior learning experiences to expedite the learning process on unseen tasks. As a data-driven approach, meta-learning requires meta-features that represent the primary learning tasks or datasets, and are estimated traditonally as engineered dataset statistics that require expert domain knowledge tailored for every meta-task. In this paper, first, we propose a meta-feature extractor called Dataset2Vec that combines the versatility of engineered dataset meta-features with the expressivity of meta-features learned by deep neural networks. Primary learning tasks or datasets are represented as hierarchical sets, i.e., as a set of sets, esp. as a set of predictor/target pairs, and then a DeepSet architecture is employed to regress meta-features on them. Second, we propose a novel auxiliary meta-learning task with abundant data called dataset similarity learning that aims to predict if two batches stem from the same dataset or different ones. In an experiment on a large-scale hyperparameter optimization task for 120 UCI datasets with varying schemas as a meta-learning task, we show that the meta-features of Dataset2Vec outperform the expert engineered meta-features and thus demonstrate the usefulness of learned meta-features for datasets with varying schemas for the first time.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.11063/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/1905.11063/full.md

## References

52 references — full list in the complete paper: https://tomesphere.com/paper/1905.11063/full.md

---
Source: https://tomesphere.com/paper/1905.11063