Linear-Time Sequence Comparison Using Minimal Absent Words & Applications
Maxime Crochemore, Gabriele Fici, Robert Merca\c{s}, Solon P., Pissis

TL;DR
This paper introduces a novel linear-time algorithm for comparing sequences based on their minimal absent words, offering an efficient alignment-free method with applications to circular sequences.
Contribution
It presents the first linear-time, linear-space algorithm for sequence comparison using all minimal absent words, advancing alignment-free genomic analysis techniques.
Findings
Algorithm operates in linear time and space
Effective comparison of circular sequences demonstrated
Provides combinatorial insights into minimal absent words
Abstract
Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realized by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as -gram distance, are usually computed in time linear with respect to the length of the sequences. In this article, we focus on the complementary idea: how two sequences can be efficiently compared based on information that does not occur in the sequences. A word is an {\em absent word} of some sequence if it does not occur in the sequence. An absent word is {\em minimal} if all its proper factors occur in the sequence. Here we present the first linear-time and linear-space algorithm to compare two sequences by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Genomics and Phylogenetic Studies · Machine Learning in Bioinformatics
