Joint String Complexity for Markov Sources: Small Data Matters
Philippe Jacquet, Dimitris Milioris, Wojciech Szpankowski

TL;DR
This paper investigates the joint string complexity for strings generated by Markov sources, revealing linear growth for similar sources and sublinear growth for different sources, using advanced analytic techniques.
Contribution
It introduces the concept of joint string complexity for Markov sources and provides a detailed asymptotic analysis revealing growth behaviors based on source similarity.
Findings
Joint string complexity grows linearly for statistically indistinguishable sources.
Joint string complexity grows sublinearly when sources are statistically different.
Application of advanced analytic methods uncovers oscillatory phenomena in complexity analysis.
Abstract
String complexity is defined as the cardinality of a set of all distinct words (factors) of a given string. For two strings, we introduce the joint string complexity as the cardinality of a set of words that are common to both strings. String complexity finds a number of applications from capturing the richness of a language to finding similarities between two genome sequences. In this paper we analyze the joint string complexity when both strings are generated by Markov sources. We prove that the joint string complexity grows linearly (in terms of the string lengths) when both sources are statistically indistinguishable and sublinearly when sources are statistically not the same. Precise analysis of the joint string complexity turns out to be quite challenging requiring subtle singularity analysis and saddle point method over infinity many saddle points leading to novel oscillatory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Fractal and DNA sequence analysis · Bayesian Methods and Mixture Models
