Text Searching Allowing for Non-Overlapping Adjacent Unbalanced Translocations
Domenico Cantone, Simone Faro, Arianna Pavone

TL;DR
This paper introduces algorithms for approximate string matching allowing non-overlapping unbalanced translocations of adjacent factors, addressing a gap in handling large-scale string modifications relevant in genetics, language, and music.
Contribution
It presents three novel algorithms with worst-case and average-case complexities for matching strings with this specific type of translocation operation.
Findings
Algorithms with O(nm^3) worst-case complexity
Improved average-case complexity of O(n log^2_sigma m
Applications in genetics, language, and music analysis
Abstract
In this paper we investigate the \emph{approximate string matching problem} when the allowed edit operations are \emph{non-overlapping unbalanced translocations of adjacent factors}. Such kind of edit operations take place when two adjacent sub-strings of the text swap, resulting in a modified string. The two involved substrings are allowed to be of different lengths. Such large-scale modifications on strings have various applications. They are among the most frequent chromosomal alterations, accounted for 30\% of all losses of heterozygosity, a major genetic event causing inactivation of cancer suppressor genes. In addition, among other applications, they are frequent modifications accounted in musical or in natural language information retrieval. However, despite of their central role in so many fields of text processing, little attention has been devoted to the problem of matching…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Natural Language Processing Techniques · DNA and Biological Computing
