Feature Selection Approach with Missing Values Conducted for Statistical   Learning: A Case Study of Entrepreneurship Survival Dataset

Diego Nascimento; Anderson Ara; Francisco Louzada Neto

arXiv:1810.01061·stat.ML·October 3, 2018

Feature Selection Approach with Missing Values Conducted for Statistical Learning: A Case Study of Entrepreneurship Survival Dataset

Diego Nascimento, Anderson Ara, Francisco Louzada Neto

PDF

Open Access

TL;DR

This study compares data imputation techniques and feature selection methods to improve the prediction of entrepreneurship survival using various machine learning classifiers on a Brazilian dataset.

Contribution

It introduces a comprehensive comparison of imputation methods and feature selection for predicting small business survival, which is novel in this context.

Findings

01

KNN imputation outperforms mean and EM methods.

02

Logistic regression and SVM achieve higher accuracy.

03

Feature selection enhances model performance.

Abstract

In this article, we investigate the features which enhanced discriminate the survival in the micro and small business (MSE) using the approach of data mining with feature selection. According to the complexity of the data set, we proposed a comparison of three data imputation methods such as mean imputation (MI), k-nearest neighbor (KNN) and expectation maximization (EM) using mutually the selection of variables technique, whereby t-test, then through the data mining process using logistic regression classification methods, naive Bayes algorithm, linear discriminant analysis and support vector machine hence comparing their respective performances. The experimental results will be spread in developing a model to predict the MSE survival, providing a better understanding in the topic once it is a significant part of the Brazilian' GPA and macroeconomy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM

MethodsLogistic Regression