Efficient Computation of Periods and Covers Using Sampling
Thierry Lecroq, Francesco Pio Marino

TL;DR
This paper introduces an efficient method using characters-distance-sampling to compute string periods and covers, significantly speeding up these processes for applications in text analysis and biology.
Contribution
The paper applies the CDS technique with a single pivot to optimize computation of string periods and covers, achieving substantial speedups over traditional methods.
Findings
Speedups of 38%-43% for period computation
Speedups of 63%-72% for cover detection
Demonstrates effectiveness of CDS-based representations
Abstract
Identifying regularities in strings, such as \emph{periods} and \emph{covers}, is crucial for applications in text compression, computational biology, and pattern recognition. \emph{Characters-Distance-Sampling} (\texttt{CDS}) is an efficient technique that encodes a string by storing distances between selected pivot characters, accelerating string-processing tasks. We apply \texttt{CDS} to compute periods and shortest covers, selecting only the first character as the pivot. This strategy yields optimized computations, achieving speedups of -- for period computation and -- for cover detection. These results demonstrate the potential of \texttt{CDS}-based representations for efficient string analysis and broader applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Network Packet Processing and Optimization · Natural Language Processing Techniques
