TL;DR
copMEM is a novel algorithm that efficiently finds all maximum exact matches of a minimum length between large genomes by sampling both genomes with coprime steps, significantly improving speed and memory usage.
Contribution
It introduces copMEM, a new sampling-based method for genome comparison that efficiently computes MEMs with minimal resource requirements.
Findings
Finds all MEMs of length ≥100 between human and mouse genomes in under 2 minutes.
Uses less than 10 GB RAM, demonstrating high efficiency.
Single-threaded implementation with rapid performance.
Abstract
Genome-to-genome comparisons require designating anchor points, which are given by Maximum Exact Matches (MEMs) between their sequences. For large genomes this is a challenging problem and the performance of existing solutions, even in parallel regimes, is not quite satisfactory. We present a new algorithm, copMEM, that allows to sparsely sample both input genomes, with sampling steps being coprime. Despite being a single-threaded implementation, copMEM computes all MEMs of minimum length 100 between the human and mouse genomes in less than 2 minutes, using less than 10 GB of RAM memory.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
