Random Fragments Classification of Microbial Marker Clades with Multi-class SVM and N-Best Algorithm
Jingwei Liu

TL;DR
This paper presents a novel SVM-based framework with N-best algorithm and time series feature extraction for classifying microbial marker genome fragments, achieving high accuracy in species identification.
Contribution
It introduces a new classification framework combining multi-class SVM, N-best algorithm, and time series features for microbial genome fragment recognition.
Findings
Recognition accuracy above 28% in top-1 candidate
Recognition accuracy above 91% in top-10 candidates
Effective classification across multiple species and fragment recognition strategies
Abstract
Microbial clades modeling is a challenging problem in biology based on microarray genome sequences, especially in new species gene isolates discovery and category. Marker family genome sequences play important roles in describing specific microbial clades within species, a framework of support vector machine (SVM) based microbial species classification with N-best algorithm is constructed to classify the centroid marker genome fragments randomly generated from marker genome sequences on MetaRef. A time series feature extraction method is proposed by segmenting the centroid gene sequences and mapping into different dimensional spaces. Two ways of data splitting are investigated according to random splitting fragments along genome sequence (DI) , or separating genome sequences into two parts (DII).Two strategies of fragments recognition tasks, dimension-by-dimension and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Gene expression and cancer classification · Machine Learning in Bioinformatics
