Approximating edit distances between complex tandem repeats efficiently
Riki Kawahara, Shinichi Morishita

TL;DR
This paper introduces a fast algorithm to estimate edit distances between complex tandem repeats, which are linked to various diseases and evolutionary diversity.
Contribution
The novel contribution is an efficient heuristic algorithm (hEDDC) that estimates edit distances with high accuracy and significant speed improvements.
Findings
The proposed algorithm achieves a Pearson correlation coefficient of >0.983 with accurate edit distances.
The heuristic algorithm provides orders of magnitude performance speedup compared to traditional methods.
Abstract
Extended tandem repeats (TRs) have been associated with 60 or more diseases over the past 30 years. Although most TRs have single repeat units (or motifs), complex TRs with different units have recently been correlated with some brain disorders. Of note, a population-scale analysis shows that complex TRs at one locus can be divergent, and different units are often expanded between individuals. To understand the evolution of high TR diversity, it is informative to visualize a phylogenetic tree. To do this, we need to measure the edit distance between pairs of complex TRs by considering duplication and contraction of units created by replication slippage. However, traditional rigorous algorithms for this purpose are computationally expensive. We here propose an efficient heuristic algorithm to estimate the edit distance with duplication and contraction of units (EDDC, for short). We…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · RNA and protein synthesis mechanisms · Algorithms and Data Compression
