
TL;DR
This paper introduces a method to efficiently store and query genomic databases with SNP variations using compressed suffix arrays, leveraging the uniqueness of substrings between SNPs.
Contribution
It presents a novel approach to compress and index SNP-rich genomic data by exploiting the structure of SNPs and unique substrings, improving storage and query efficiency.
Findings
Enables fast compressed suffix array construction for SNP databases
Reduces storage requirements for genomic data with SNP variations
Maintains efficient query performance on SNP-rich genomes
Abstract
Single-nucleotide polymorphisms (SNPs) account for most variations between human genomes. We show how, if the genomes in a database differ only by a reasonable number of SNPs and the substrings between those SNPs are unique, then we can store a fast compressed suffix array for that database.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Genomics and Phylogenetic Studies · Fractal and DNA sequence analysis
