A Subsequence-Histogram Method for Generic Vocabulary Recognition over Deletion Channels
Majid Fozunbal

TL;DR
This paper introduces a polynomial-time approximation algorithm for recognizing vocabularies from subsequences received over deletion channels, leveraging subsequence-histograms to efficiently distinguish between vocabularies without prior structural assumptions.
Contribution
The paper proposes a novel polynomial approximation algorithm that uses subsequence-histograms for vocabulary recognition, avoiding exponential complexity of MAP solutions and not requiring prior vocabulary structure assumptions.
Findings
Algorithm achieves MAP-like performance in some cases
Demonstrated effectiveness on example datasets
Applicable to bioinformatics, storage, and search systems
Abstract
We consider the problem of recognizing a vocabulary--a collection of words (sequences) over a finite alphabet--from a potential subsequence of one of its words. We assume the given subsequence is received through a deletion channel as a result of transmission of a random word from one of the two generic underlying vocabularies. An exact maximum a posterior (MAP) solution for this problem counts the number of ways a given subsequence can be derived from particular subsets of candidate vocabularies, requiring exponential time or space. We present a polynomial approximation algorithm for this problem. The algorithm makes no prior assumption about the rules and patterns governing the structure of vocabularies. Instead, through off-line processing of vocabularies, it extracts data regarding regularity patterns in the subsequences of each vocabulary. In the recognition phase, the algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · DNA and Biological Computing · Genomics and Phylogenetic Studies
