Sparse Long Blocks and the Micro-Structure of the Longest Common Subsequences
S. Amsalu, C. Houdr\'e, H. Matzinger

TL;DR
This paper investigates how long constant blocks in random strings are aligned in the Longest Common Subsequence problem, revealing different behaviors depending on alphabet size, with simulations supporting theoretical findings.
Contribution
It demonstrates that the alignment behavior of long blocks varies significantly between two-letter and three-or-more-letter alphabets, highlighting fundamental differences in optimal alignments.
Findings
Two-letter alphabets align long blocks mainly with matching symbols.
Three or more-letter alphabets align long blocks mainly with gaps.
Simulations confirm the theoretical difference in gap proportions for different alphabet sizes.
Abstract
Consider two random strings having the same length and generated by an iid sequence taking its values uniformly in a fixed finite alphabet. Artificially place a long constant block into one of the strings, where a constant block is a contiguous substring consisting only of one type of symbol. The long block replaces a segment of equal size and its length is smaller than the length of the strings, but larger than its square-root. We show that for sufficiently long strings the optimal alignment corresponding to a Longest Common Subsequence (LCS) treats the inserted block very differently depending on the size of the alphabet. For two-letter alphabets, the long constant block gets mainly aligned with the same symbol from the other string, while for three or more letters the opposite is true and the block gets mainly aligned with gaps. We further provide simulation results on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Bayesian Methods and Mixture Models · Stochastic processes and statistical mechanics
