Fast detection of specific fragments against a set of sequences
Marie-Pierre B\'eal, Maxime Crochemore

TL;DR
This paper introduces efficient alignment-free methods for rapidly identifying target-specific sequence fragments against a reference set, with algorithms that operate in linear time and real-time for large datasets.
Contribution
It presents novel algorithms for computing target-specific factors and their occurrences, enabling fast, alignment-free sequence comparison against references.
Findings
Automaton construction for target-specific factors runs in linear time.
Algorithm for finding all occurrences operates in real-time.
Methods are efficient for large sequence datasets.
Abstract
We design alignment-free techniques for comparing a sequence or word, called a target, against a set of words, called a reference. A target-specific factor of a target against a reference is a factor of a word in which is not a factor of a word of and such that any proper factor of is a factor of a word of . We first address the computation of the set of target-specific factors of a target against a reference , where and are finite sets of sequences. The result is the construction of an automaton accepting the set of all considered target-specific factors. The construction algorithm runs in linear time according to the size of . The second result consists of the design of an algorithm to compute all the occurrences in a single sequence of its target-specific factors against a reference . The algorithm runs in real-time on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Handwritten Text Recognition Techniques · semigroups and automata theory
