Extensive Simulations for Longest Common Subsequences: Finite Size Scaling, a Cavity Solution, and Configuration Space properties
J. Boutet de Monvel

TL;DR
This paper uses extensive simulations to analyze the statistical properties and scaling laws of the Longest Common Subsequence problem, revealing new insights into its universality class, solution space, and correlations.
Contribution
It introduces a finite size scaling law for the mean LCS length, provides precise estimates of key constants, and explores the problem's ground state and solution space properties.
Findings
Finite size scaling law for E(L)/N with specific constants
Expression for the limit of L(N,M)/N as N→∞ in the Bernoulli model
Residual entropy indicating exponential growth of solutions
Abstract
The Longest Common Subsequence (LCS) Problem asks for the longest sequence of (non-contiguous) matches between two given strings of characters. Using extensive Monte Carlo simulations, we find a finite size scaling law of the form E(L)/N =C + A/(N^1/2 ln N)+... for the mean LCS length of two random strings of size N over S letters. We provide precise estimates of C for S between 2 and 15. We consider also a related Bernoulli Matching model where the different entries of an N times M array are independently occupied with probability 1/S. In that case we find the expression of the limit of L(N,M)/N as N grows to infinity, as a function of r=M/N. This expression provides a very good approximation for the Random String model, which gets more and more accurate as S increases. The question of the ``universality class'' of the LCS problem is also considered. For the Bernoulli Matching model we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Stochastic processes and statistical mechanics · Bayesian Methods and Mixture Models
