Practical colinear chaining on sequences revisited

Nicola Rizzo; Manuel C\'aceres; Veli M\"akinen

arXiv:2506.11750·cs.DS·July 3, 2025

Practical colinear chaining on sequences revisited

Nicola Rizzo, Manuel C\'aceres, Veli M\"akinen

PDF

Open Access

TL;DR

This paper revisits colinear chaining algorithms, introduces an optimal method with improved guarantees, and demonstrates its effectiveness and minimal slowdown on real long read datasets.

Contribution

It develops an optimal colinear chaining algorithm with proven average-case efficiency, improving upon previous practical solutions.

Findings

01

The new algorithm guarantees optimal chaining cost within the same average-case complexity.

02

ChainX can be suboptimal on realistic long read datasets.

03

The proposed method incurs minimal computational slowdown.

Abstract

Colinear chaining is a classical heuristic for sequence alignment and is widely used in modern practical aligners. Jain et al. (J. Comput. Biol. 2022) proposed an $O (n lo g^{3} n)$ time algorithm to chain a set of $n$ anchors so that the chaining cost matches the edit distance of the input sequences, when anchors are all the maximal exact matches. Moreover, assuming a uniform and sparse distribution of anchors, they provided a practical solution ( $ChainX$ ) working in $O (n \cdot SOL + n lo g n)$ average-case time, where $SOL$ is the cost of the output chain. This practical solution is not guaranteed to be optimal: we study the failing cases, introduce the anchor diagonal distance, and find and implement an optimal algorithm working in $O (n \cdot OPT + n lo g n)$ average-case time, where $OPT$ $\leq SOL$ is the optimal chaining cost. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFractal and DNA sequence analysis