Multi-objective Feature Selection with Missing Data in Classification
Yu Xue, Yihang Tang, Xin Xu, Jiayu Liang, Ferrante Neri

TL;DR
This paper introduces a novel three-objective feature selection model that incorporates data reliability to improve classification performance on datasets with missing data, using NSGA-III for optimization.
Contribution
It extends traditional bi-objective feature selection by adding reliability as a third objective and applies NSGA-III to effectively solve this enhanced problem.
Findings
The three-objective model improves feature selection on incomplete datasets.
NSGA-III efficiently finds optimal feature subsets considering accuracy, size, and reliability.
Experimental results confirm the effectiveness of the proposed approach.
Abstract
Feature selection (FS) is an important research topic in machine learning. Usually, FS is modelled as a+ bi-objective optimization problem whose objectives are: 1) classification accuracy; 2) number of features. One of the main issues in real-world applications is missing data. Databases with missing data are likely to be unreliable. Thus, FS performed on a data set missing some data is also unreliable. In order to directly control this issue plaguing the field, we propose in this study a novel modelling of FS: we include reliability as the third objective of the problem. In order to address the modified problem, we propose the application of the non-dominated sorting genetic algorithm-III (NSGA-III). We selected six incomplete data sets from the University of California Irvine (UCI) machine learning repository. We used the mean imputation method to deal with the missing data. In the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
