Length of the longest common subsequence between overlapping words
Boris Bukh, Raymond Hogenson

TL;DR
This paper investigates the expected length of the longest common subsequence (LCS) between two overlapping random sequences, revealing it is approximately the maximum of the overlap length and the typical LCS length of independent sequences, with tail bounds provided.
Contribution
It introduces a new analysis of LCS length for overlapping sequences, connecting it to independent sequence LCS expectations and providing probabilistic tail bounds.
Findings
Expected LCS length approximates max(overlap, independent LCS)
Derived tail bounds for LCS length in overlapping sequences
Established theoretical relationship between overlap and LCS length
Abstract
Given two random finite sequences from such that a prefix of the first sequence is a suffix of the second, we examine the length of their longest common subsequence. If is the length of the overlap, we prove that the expected length of an LCS is approximately , where is the length of an LCS between two independent random sequences. We also obtain tail bounds on this quantity.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
