Synthetic Data for Feature Selection

Firuz Kamalov; Hana Sulieman; Aswani Kumar Cherukuri

arXiv:2211.03035·cs.LG·November 8, 2022

Synthetic Data for Feature Selection

Firuz Kamalov, Hana Sulieman, Aswani Kumar Cherukuri

PDF

Open Access 1 Repo

TL;DR

This paper introduces a set of synthetic datasets based on electronics applications to serve as standardized benchmarks for evaluating feature selection algorithms in machine learning.

Contribution

It provides a collection of synthetic datasets with controlled parameters, specifically designed for benchmarking feature selection methods, and makes them publicly available.

Findings

01

Datasets enable precise evaluation of feature selection algorithms.

02

Testing with popular algorithms demonstrates their utility.

03

Datasets are publicly accessible for research use.

Abstract

Feature selection is an important and active field of research in machine learning and data science. Our goal in this paper is to propose a collection of synthetic datasets that can be used as a common reference point for feature selection algorithms. Synthetic datasets allow for precise evaluation of selected features and control of the data parameters for comprehensive assessment. The proposed datasets are based on applications from electronics in order to mimic real life scenarios. To illustrate the utility of the proposed data we employ one of the datasets to test several popular feature selection algorithms. The datasets are made publicly available on GitHub and can be used by researchers to evaluate feature selection algorithms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

group-automorphism/synthetic_data
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvolutionary Algorithms and Applications · Neural Networks and Applications · Fuzzy Logic and Control Systems

MethodsTest · Feature Selection