Pre-Selection of Independent Binary Features: An Application to Diagnosing Scrapie in Sheep
Ludmila Kuncheva, C. Whitaker, P. Cockcroft, Z. S. Hoare

TL;DR
This paper presents a method for selecting independent binary features using Sequential Forward Selection, applied to diagnosing Scrapie in sheep, with a focus on feature stability and optimal subset identification.
Contribution
It introduces a feature selection approach based on Naive Bayes assumptions and analyzes its robustness in the context of diagnosing Scrapie in sheep.
Findings
Selected features remain stable under probability variations
Sequential Forward Selection effectively identifies relevant features
Application to Scrapie diagnosis demonstrates practical utility
Abstract
Suppose that the only available information in a multi-class problem are expert estimates of the conditional probabilities of occurrence for a set of binary features. The aim is to select a subset of features to be measured in subsequent data collection experiments. In the lack of any information about the dependencies between the features, we assume that all features are conditionally independent and hence choose the Naive Bayes classifier as the optimal classifier for the problem. Even in this (seemingly trivial) case of complete knowledge of the distributions, choosing an optimal feature subset is not straightforward. We discuss the properties and implementation details of Sequential Forward Selection (SFS) as a feature selection procedure for the current problem. A sensitivity analysis was carried out to investigate whether the same features are selected when the probabilities vary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Imbalanced Data Classification Techniques
