Improving Performance of a Group of Classification Algorithms Using Resampling and Feature Selection
Mehdi Naseriparsa, Amir-masoud Bidgoli, Touraj Varaee

TL;DR
This paper introduces a hybrid resampling and feature selection method that improves classification accuracy and reduces errors across multiple algorithms on a lung cancer dataset.
Contribution
It presents a novel combination of resampling, filtering, and genetic search for feature selection that outperforms existing methods in accuracy and cost.
Findings
Significant reduction in classification errors.
Improved average performance of five classifiers.
Outperforms other feature selection techniques.
Abstract
In recent years the importance of finding a meaningful pattern from huge datasets has become more challenging. Data miners try to adopt innovative methods to face this problem by applying feature selection methods. In this paper we propose a new hybrid method in which we use a combination of resampling, filtering the sample domain and wrapper subset evaluation method with genetic search to reduce dimensions of Lung-Cancer dataset that we received from UCI Repository of Machine Learning databases. Finally, we apply some well- known classification algorithms (Na\"ive Bayes, Logistic, Multilayer Perceptron, Best First Decision Tree and JRIP) to the resulting dataset and compare the results and prediction rates before and after the application of our feature selection method on that dataset. The results show a substantial progress in the average performance of five classification algorithms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Text and Document Classification Technologies · Imbalanced Data Classification Techniques
