Minimal cost feature selection of data with normal distribution measurement errors
Hong Zhao, Fan Min, William Zhu

TL;DR
This paper introduces a novel approach for minimal cost feature selection on numerical data with normal distribution measurement errors, balancing test and misclassification costs using a backtracking algorithm.
Contribution
It develops a new data model with confidence interval neighborhoods and proposes an efficient backtracking algorithm with pruning for feature selection under cost constraints.
Findings
Pruning techniques significantly improve algorithm efficiency.
Algorithm performs well on datasets with nearly 1000 objects.
Neighborhoods based on confidence intervals better preserve data information.
Abstract
Minimal cost feature selection is devoted to obtain a trade-off between test costs and misclassification costs. This issue has been addressed recently on nominal data. In this paper, we consider numerical data with measurement errors and study minimal cost feature selection in this model. First, we build a data model with normal distribution measurement errors. Second, the neighborhood of each data item is constructed through the confidence interval. Comparing with discretized intervals, neighborhoods are more reasonable to maintain the information of data. Third, we define a new minimal total cost feature selection problem through considering the trade-off between test costs and misclassification costs. Fourth, we proposed a backtracking algorithm with three effective pruning techniques to deal with this problem. The algorithm is tested on four UCI data sets. Experimental results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRough Sets and Fuzzy Logic · Bayesian Modeling and Causal Inference · Data Mining Algorithms and Applications
