Hidden Markov Models for Gene Sequence Classification: Classifying the VSG genes in the Trypanosoma brucei Genome
Andrea Mesa, Sebasti\'an Basterrech, Gustavo Guerberoff, Fernando, Alvarez-Valin

TL;DR
This paper demonstrates that Hidden Markov Models can effectively identify VSG genes in Trypanosoma brucei genomes, outperforming traditional methods due to their ability to handle low sequence identity and gene boundary detection.
Contribution
The study introduces the application of HMMs for VSG gene identification, showing their effectiveness over homology-based methods in parasite genomes.
Findings
HMMs achieve high sensitivity in VSG gene detection.
The model maintains low false positive rates.
Performance varies with the number of states in the HMM.
Abstract
The article presents an application of Hidden Markov Models (HMMs) for pattern recognition on genome sequences. We apply HMM for identifying genes encoding the Variant Surface Glycoprotein (VSG) in the genomes of Trypanosoma brucei (T. brucei) and other African trypanosomes. These are parasitic protozoa causative agents of sleeping sickness and several diseases in domestic and wild animals. These parasites have a peculiar strategy to evade the host's immune system that consists in periodically changing their predominant cellular surface protein (VSG). The motivation for using patterns recognition methods to identify these genes, instead of traditional homology based ones, is that the levels of sequence identity (amino acid and DNA sequence) amongst these genes is often below of what is considered reliable in these methods. Among pattern recognition approaches, HMM are particularly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Machine Learning in Bioinformatics · RNA and protein synthesis mechanisms
