Assessing the best edit in perturbation-based iterative refinement algorithms to compute the median string
P. Mirabal, J. Abreu, D. Seco

TL;DR
This paper introduces a new algorithm for finding median strings in biological sequence data that improves convergence speed over existing heuristics by better estimating the impact of perturbations.
Contribution
The paper presents a novel perturbation-based iterative refinement algorithm that outperforms current heuristics in median string computation speed.
Findings
The new algorithm converges faster than state-of-the-art methods.
Experimental results validate the improved convergence speed.
The approach maintains high quality of median string approximation.
Abstract
Strings are a natural representation of biological data such as DNA, RNA and protein sequences. The problem of finding a string that summarizes a set of sequences has direct application in relative compression algorithms for genome and proteome analysis, where reference sequences need to be chosen. Median strings have been used as representatives of a set of strings in different domains. However, several formulations of those problems are NP-Complete. Alternatively, heuristic approaches that iteratively refine an initial coarse solution by applying edit operations have been proposed. Recently, we investigated the selection of the optimal edit operations to speed up convergence without spoiling the quality of the approximated median string. We propose a novel algorithm that outperforms state of the art heuristic approximations to the median string in terms of convergence speed by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
