Anchor points for genome alignment based on Filtered Spaced Word Matches

Chris-Andre Leimeister; Thomas Dencker; Burkhard Morgenstern

arXiv:1703.08792·q-bio.GN·March 28, 2017·1 cites

Anchor points for genome alignment based on Filtered Spaced Word Matches

Chris-Andre Leimeister, Thomas Dencker, Burkhard Morgenstern

PDF

Open Access

TL;DR

This paper introduces a novel method using Filtered Spaced Word Matches to identify anchor points, significantly improving genome alignment quality for distantly related sequences within existing pipelines.

Contribution

It proposes a new approach for selecting anchor points in genome alignment based on Filtered Spaced Word Matches, enhancing alignment quality for challenging sequences.

Findings

01

Improved alignment quality for distantly related genomes.

02

Enhanced performance of Mugsy pipeline with new anchor points.

03

Demonstrated effectiveness on large genomic sequences.

Abstract

Alignment of large genomic sequences is a fundamental task in computational genome analysis. Most methods for genomic alignment use high-scoring local alignments as {\em anchor points} to reduce the search space of the alignment procedure. Speed and quality of these methods therefore depend on the underlying anchor points. Herein, we propose to use {\em Filtered Spaced Word Matches} to calculate anchor points for genome alignment. To evaluate this approach, we used these anchor points in the the widely used alignment pipeline {\em Mugsy}. For distantly related sequence sets, we could substantially improve the quality of alignments produced by {\em Mugsy}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenomics and Phylogenetic Studies · Algorithms and Data Compression · RNA and protein synthesis mechanisms