# Selection of a Minimal Number of Significant Porcine SNPs by an   Information Gain and Genetic Algorithm Hybrid Model

**Authors:** Wanthanee Rathasamuth, Kitsuchart Pasupa, Sissades Tongsima

arXiv: 1905.09059 · 2025-02-06

## TL;DR

This paper introduces a hybrid feature selection method combining information gain, genetic algorithms, and frequency filtering to identify a minimal set of significant porcine SNPs, achieving high breed classification accuracy.

## Contribution

It presents a novel hybrid SNP selection approach that effectively reduces the number of SNPs needed for accurate pig breed classification.

## Key findings

- Reduced SNPs to 0.86% of total
- Achieved 94.80% classification accuracy
- Demonstrated effectiveness of hybrid feature selection

## Abstract

A panel of large number of common Single Nucleotide Polymorphisms (SNPs) distributed across an entire porcine genome has been widely used to represent genetic variability of pig. With the advent of SNP-array technology, a genome-wide genetic profile of a specimen can be easily observed. Among the large number of such variations, there exist a much smaller subset of the SNP panel that could equally be used to correctly identify the corresponding breed. This work presents a SNP selection heuristic that can still be used effectively in the breed classification process. The proposed feature selection was done by the approach of combining a filter method and a wrapper method--information gain method and genetic algorithm--plus a feature frequency selection step, while classification was done by support vector machine. The approach was able to reduce the number of significant SNPs to 0.86 % of the total number of SNPs in a swine dataset and provided a high classification accuracy of 94.80 %.

---
Source: https://tomesphere.com/paper/1905.09059