LinearSankoff: Linear-time Simultaneous Folding and Alignment of RNA Homologs
Sizhen Li, Ning Dai, He Zhang, Apoorv Malik, David H. Mathews and, Liang Huang

TL;DR
LinearSankoff is a novel linear-time algorithm that combines folding and alignment of RNA homologs, significantly improving efficiency and accuracy, enabling analysis of large viral genomes like SARS-CoV-2.
Contribution
It introduces the first linear-time method for simultaneous RNA folding and alignment, integrating a Hidden Markov Model and beam search heuristics.
Findings
Achieves linear runtime scaling with sequence length.
Improves alignment and secondary structure prediction accuracy.
Successfully analyzes large viral genomes within minutes.
Abstract
The classical Sankoff algorithm for the simultaneous folding and alignment of homologous RNA sequences is highly influential, but it suffers from two major limitations in efficiency and modeling power. First, it takes for two sequences where n is the average sequence length. Most implementations and variations reduce the runtime to by restricting the alignment search space, but this is still too slow for long sequences such as full-length viral genomes. On the other hand, the Sankoff algorithm and all its existing implementations use a rather simplistic alignment model, which can result in poor alignment accuracy. To address these problems, we propose LinearSankoff, which seamlessly integrates the original Sankoff algorithm with a powerful Hidden Markov Model-based alignment module. This extension substantially improves alignment quality, which in turn benefits…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA and protein synthesis mechanisms · Genomics and Phylogenetic Studies · Bacteriophages and microbial interactions
