Linear-time Computation of Minimal Absent Words Using Suffix Array
Carl Barton, Alice Heliou, Laurent Mouchard, Solon P. Pissis

TL;DR
This paper introduces a new linear-time and linear-space algorithm for computing all minimal absent words in a string using suffix arrays, improving efficiency over previous methods and demonstrated by superior experimental performance.
Contribution
The paper presents the first linear-time and linear-space suffix array-based algorithm for minimal absent words, filling a gap in efficient sequence analysis tools.
Findings
The new algorithm outperforms previous suffix array-based methods in speed.
Experimental results confirm the efficiency of the implementation on real and synthetic data.
The approach enables faster sequence comparison in genomic studies.
Abstract
An absent word of a word y of length n is a word that does not occur in y. It is a minimal absent word if all its proper factors occur in y. Minimal absent words have been computed in genomes of organisms from all domains of life; their computation provides a fast alternative for measuring approximation in sequence comparison. There exists an O(n)-time and O(n)-space algorithm for computing all minimal absent words on a fixed-sized alphabet based on the construction of suffix automata (Crochemore et al., 1998). No implementation of this algorithm is publicly available. There also exists an O(n^2)-time and O(n)-space algorithm for the same problem based on the construction of suffix arrays (Pinho et al., 2009). An implementation of this algorithm was also provided by the authors and is currently the fastest available. In this article, we bridge this unpleasant gap by presenting an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Genomics and Phylogenetic Studies · RNA and protein synthesis mechanisms
