Practical colinear chaining on sequences revisited
Nicola Rizzo, Manuel C\'aceres, Veli M\"akinen

TL;DR
This paper revisits colinear chaining algorithms, introduces an optimal method with improved guarantees, and demonstrates its effectiveness and minimal slowdown on real long read datasets.
Contribution
It develops an optimal colinear chaining algorithm with proven average-case efficiency, improving upon previous practical solutions.
Findings
The new algorithm guarantees optimal chaining cost within the same average-case complexity.
ChainX can be suboptimal on realistic long read datasets.
The proposed method incurs minimal computational slowdown.
Abstract
Colinear chaining is a classical heuristic for sequence alignment and is widely used in modern practical aligners. Jain et al. (J. Comput. Biol. 2022) proposed an time algorithm to chain a set of anchors so that the chaining cost matches the edit distance of the input sequences, when anchors are all the maximal exact matches. Moreover, assuming a uniform and sparse distribution of anchors, they provided a practical solution () working in average-case time, where is the cost of the output chain. This practical solution is not guaranteed to be optimal: we study the failing cases, introduce the anchor diagonal distance, and find and implement an optimal algorithm working in average-case time, where is the optimal chaining cost. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFractal and DNA sequence analysis
