Selection of a Minimal Number of Significant Porcine SNPs by an Information Gain and Genetic Algorithm Hybrid Model
Wanthanee Rathasamuth, Kitsuchart Pasupa, Sissades Tongsima

TL;DR
This paper introduces a hybrid feature selection method combining information gain, genetic algorithms, and frequency filtering to identify a minimal set of significant porcine SNPs, achieving high breed classification accuracy.
Contribution
It presents a novel hybrid SNP selection approach that effectively reduces the number of SNPs needed for accurate pig breed classification.
Findings
Reduced SNPs to 0.86% of total
Achieved 94.80% classification accuracy
Demonstrated effectiveness of hybrid feature selection
Abstract
A panel of large number of common Single Nucleotide Polymorphisms (SNPs) distributed across an entire porcine genome has been widely used to represent genetic variability of pig. With the advent of SNP-array technology, a genome-wide genetic profile of a specimen can be easily observed. Among the large number of such variations, there exist a much smaller subset of the SNP panel that could equally be used to correctly identify the corresponding breed. This work presents a SNP selection heuristic that can still be used effectively in the breed classification process. The proposed feature selection was done by the approach of combining a filter method and a wrapper method--information gain method and genetic algorithm--plus a feature frequency selection step, while classification was done by support vector machine. The approach was able to reduce the number of significant SNPs to 0.86 %…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFeature Selection
