Transforming Datasets to Requested Complexity with Projection-based Many-Objective Genetic Algorithm
Joanna Komorniczak

TL;DR
This paper introduces a projection-based genetic algorithm that transforms synthetic datasets to match desired complexity levels, aiding in the evaluation of machine learning models across diverse problem difficulties.
Contribution
It presents a novel genetic algorithm that optimizes dataset complexity measures via linear projections, enabling controlled generation of datasets with specific difficulty levels.
Findings
Generated datasets with targeted complexity levels.
Strong correlation between data complexity and classifier/regressor performance.
Effective transformation of synthetic data to desired complexity targets.
Abstract
The research community continues to seek increasingly more advanced synthetic data generators to reliably evaluate the strengths and limitations of machine learning methods. This work aims to increase the availability of datasets encompassing a diverse range of problem complexities by proposing a genetic algorithm that optimizes a set of problem complexity measures for classification and regression tasks towards specific targets. For classification, a set of 10 complexity measures was used, while for regression tasks, 4 measures demonstrating promising optimization capabilities were selected. Experiments confirmed that the proposed genetic algorithm can generate datasets with varying levels of difficulty by transforming synthetically created datasets to achieve target complexity values through linear feature projections. Evaluations involving state-of-the-art classifiers and regressors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Evolutionary Algorithms and Applications · Rough Sets and Fuzzy Logic
