A Three-Stage Algorithm for the Closest String Problem on Artificial and Real Gene Sequences
Alireza Abdi, Marko Djukanovic, Hesam Tahmasebi Boldaji, Hadis Salehi,, Aleksandar Kartelj

TL;DR
This paper presents a novel three-stage algorithm for the NP-hard Closest String Problem, improving solution quality for DNA and protein sequences through alphabet pruning, heuristic search, and local optimization, validated on real datasets.
Contribution
Introduces a three-stage algorithm with a new alphabet pruning method, a beam search variant, and local search for better solutions on biological sequences.
Findings
Outperforms previous methods on real-world datasets
Effective in reducing search space and improving solution quality
Validated on artificial and real gene sequences
Abstract
The Closest String Problem is an NP-hard problem that aims to find a string that has the minimum distance from all sequences that belong to the given set of strings. Its applications can be found in coding theory, computational biology, and designing degenerated primers, among others. There are efficient exact algorithms that have reached high-quality solutions for binary sequences. However, there is still room for improvement concerning the quality of solutions over DNA and protein sequences. In this paper, we introduce a three-stage algorithm that comprises the following process: first, we apply a novel alphabet pruning method to reduce the search space for effectively finding promising search regions. Second, a variant of beam search to find a heuristic solution is employed. This method utilizes a newly developed guiding function based on an expected distance heuristic score of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · DNA and Biological Computing · semigroups and automata theory
MethodsSparse Evolutionary Training · Pruning
