String Matching and 1d Lattice Gases
Muhittin Mungan

TL;DR
This paper models string matching probabilities using a 1D lattice gas analogy, deriving a comprehensive distribution that bridges known asymptotic regimes and applies to complex stochastic processes.
Contribution
It introduces a novel lattice gas framework for calculating string matching probabilities, capturing intermediate regimes and generalizing to non-uniform letter distributions and Markov chains.
Findings
Reproduces distribution behavior across all regimes
Analytically derives the emergence of limiting distributions
Extends analysis to complex stochastic string models
Abstract
We calculate the probability distributions for the number of occurrences of a given letter word in a random string of letters. Analytical expressions for the distribution are known for the asymptotic regimes (i) (Gaussian) and such that is finite (Compound Poisson). However, it is known that these distributions do now work well in the intermediate regime . We show that the problem of calculating the string matching probability can be cast into a determining the configurational partition function of a 1d lattice gas with interacting particles so that the matching probability becomes the grand-partition sum of the lattice gas, with the number of particles corresponding to the number of matches. We perform a virial expansion of the effective equation of state and obtain the probability distribution. Our result…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
