EvoSplit: An evolutionary approach to split a multi-label data set into disjoint subsets
Francisco Florez-Revuelta

TL;DR
EvoSplit introduces an evolutionary algorithm to improve the partitioning of multi-label datasets into disjoint subsets, outperforming traditional methods like iterative stratification in maintaining label distributions.
Contribution
The paper presents a novel evolutionary approach, including single-objective and multi-objective algorithms, for better multi-label dataset splitting, validated on diverse datasets.
Findings
EvoSplit outperforms iterative stratification in preserving label distributions.
The multi-objective approach effectively balances label and label pair similarities.
Validated on both small and large datasets, including computer vision applications.
Abstract
This paper presents a new evolutionary approach, EvoSplit, for the distribution of multi-label data sets into disjoint subsets for supervised machine learning. Currently, data set providers either divide a data set randomly or using iterative stratification, a method that aims to maintain the label (or label pair) distribution of the original data set into the different subsets. Following the same aim, this paper first introduces a single-objective evolutionary approach that tries to obtain a split that maximizes the similarity between those distributions independently. Second, a new multi-objective evolutionary algorithm is presented to maximize the similarity considering simultaneously both distributions (labels and label pairs). Both approaches are validated using well-known multi-label data sets as well as large image data sets currently used in computer vision and machine learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
