Classifier Selection with Permutation Tests

Marta Arias; Argimiro Arratia; Ariel Duarte-Lopez

arXiv:1711.09708·cs.IR·November 28, 2017

Classifier Selection with Permutation Tests

Marta Arias, Argimiro Arratia, Ariel Duarte-Lopez

PDF

Open Access

TL;DR

This paper introduces a content-based recommender system for machine learning classifiers that uses permutation tests to better predict the most suitable classifier for a new dataset, improving recommendation quality.

Contribution

It proposes a novel approach utilizing permutation tests for classifier recommendation, enhancing the accuracy over traditional metrics like F-score.

Findings

01

Permutation tests improve classifier recommendation quality.

02

Extensive experiments with 8 classifiers and 65 datasets validate the approach.

03

Permutation-based assessment outperforms F-score in this context.

Abstract

This work presents a content-based recommender system for machine learning classifier algorithms. Given a new data set, a recommendation of what classifier is likely to perform best is made based on classifier performance over similar known data sets. This similarity is measured according to a data set characterization that includes several state-of-the-art metrics taking into account physical structure, statis- tics, and information theory. A novelty with respect to prior work is the use of a robust approach based on permutation tests to directly assess whether a given learning algorithm is able to exploit the attributes in a data set to predict class labels, and compare it to the more commonly used F-score metric for evalu- ating classifier performance. To evaluate our approach, we have conducted an extensive experimentation including 8 of the main machine learning classification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Machine Learning and Data Classification · Text and Document Classification Technologies