Anchor points for genome alignment based on Filtered Spaced Word Matches
Chris-Andre Leimeister, Thomas Dencker, Burkhard Morgenstern

TL;DR
This paper introduces a novel method using Filtered Spaced Word Matches to identify anchor points, significantly improving genome alignment quality for distantly related sequences within existing pipelines.
Contribution
It proposes a new approach for selecting anchor points in genome alignment based on Filtered Spaced Word Matches, enhancing alignment quality for challenging sequences.
Findings
Improved alignment quality for distantly related genomes.
Enhanced performance of Mugsy pipeline with new anchor points.
Demonstrated effectiveness on large genomic sequences.
Abstract
Alignment of large genomic sequences is a fundamental task in computational genome analysis. Most methods for genomic alignment use high-scoring local alignments as {\em anchor points} to reduce the search space of the alignment procedure. Speed and quality of these methods therefore depend on the underlying anchor points. Herein, we propose to use {\em Filtered Spaced Word Matches} to calculate anchor points for genome alignment. To evaluate this approach, we used these anchor points in the the widely used alignment pipeline {\em Mugsy}. For distantly related sequence sets, we could substantially improve the quality of alignments produced by {\em Mugsy}.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Algorithms and Data Compression · RNA and protein synthesis mechanisms
