Similarity Based Stratified Splitting: an approach to train better   classifiers

Felipe Farias; Teresa Ludermir; Carmelo Bastos-Filho

arXiv:2010.06099·cs.LG·October 14, 2020

Similarity Based Stratified Splitting: an approach to train better classifiers

Felipe Farias, Teresa Ludermir, Carmelo Bastos-Filho

PDF

Open Access

TL;DR

This paper introduces a similarity-based stratified splitting method that improves data partitioning for training classifiers, leading to more realistic performance estimates across various datasets and classifiers.

Contribution

The paper presents a novel SBSS technique that uses similarity functions to create more representative data splits, enhancing classifier evaluation accuracy.

Findings

01

Outperforms standard stratified cross-validation in 75% of scenarios

02

Effective across multiple classifiers and similarity functions

03

Provides more realistic performance estimates in real-world applications

Abstract

We propose a Similarity-Based Stratified Splitting (SBSS) technique, which uses both the output and input space information to split the data. The splits are generated using similarity functions among samples to place similar samples in different splits. This approach allows for a better representation of the data in the training phase. This strategy leads to a more realistic performance estimation when used in real-world applications. We evaluate our proposal in twenty-two benchmark datasets with classifiers such as Multi-Layer Perceptron, Support Vector Machine, Random Forest and K-Nearest Neighbors, and five similarity functions Cityblock, Chebyshev, Cosine, Correlation, and Euclidean. According to the Wilcoxon Sign-Rank test, our approach consistently outperformed ordinary stratified 10-fold cross-validation in 75\% of the assessed scenarios.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition · Time Series Analysis and Forecasting · Machine Learning and Data Classification