Exploration and retrieval of whole-metagenome sequencing samples
Sohan Seth, Niko V\"alim\"aki, Samuel Kaski, Antti Honkela

TL;DR
This paper introduces a content-based method for exploring and retrieving whole metagenome sequencing samples using a distributed string mining framework to extract informative sequence features, enabling accurate sample comparison.
Contribution
It presents a novel unsupervised approach leveraging distributed string mining to efficiently compare and retrieve metagenomic samples based on sequence content.
Findings
Effective discrimination of different body sites.
Enrichment of diseased samples in query results.
High accuracy in sample comparison.
Abstract
Over the recent years, the field of whole metagenome shotgun sequencing has witnessed significant growth due to the high-throughput sequencing technologies that allow sequencing genomic samples cheaper, faster, and with better coverage than before. This technical advancement has initiated the trend of sequencing multiple samples in different conditions or environments to explore the similarities and dissimilarities of the microbial communities. Examples include the human microbiome project and various studies of the human intestinal tract. With the availability of ever larger databases of such measurements, finding samples similar to a given query sample is becoming a central operation. In this paper, we develop a content-based exploration and retrieval method for whole metagenome sequencing samples. We apply a distributed string mining framework to efficiently extract all informative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Gut microbiota and health · Genomics and Phylogenetic Studies
