Application of Markov Structure of Genomes to Outlier Identification and Read Classification
Alan F. Karr, Jason Hauzel, Adam A. Porter, Marcel Schaefer

TL;DR
This paper leverages the Markov structure of genomes to improve outlier detection in genome databases and enhance read classification in metagenomics, demonstrating practical applications with coronavirus and adenovirus data.
Contribution
It introduces a novel approach using second-order Markov models for genome analysis, specifically for outlier detection and read classification in metagenomics.
Findings
Effective outlier identification in genome databases
Improved accuracy in read classification for metagenomics
Validated with coronavirus and adenovirus datasets
Abstract
In this paper we apply the structure of genomes as second-order Markov processes specified by the distributions of successive triplets of bases to two bioinformatics problems: identification of outliers in genome databases and read classification in metagenomics, using real coronavirus and adenovirus data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Algorithms and Data Compression
