Constructions for Clumps Statistics

Frederique Bassino; Julien Clement; Julien Fayolle; Pierre Nicodeme

arXiv:0804.3671·cs.DM·April 24, 2008·5 cites

Constructions for Clumps Statistics

Frederique Bassino, Julien Clement, Julien Fayolle, Pierre Nicodeme

PDF

Open Access

TL;DR

This paper develops a combinatorial approach to analyze clump statistics in words, providing exact results for short sequences, complementing existing probabilistic methods that focus on asymptotic behavior.

Contribution

It introduces a combinatorial framework for clump analysis, filling the gap left by probabilistic approaches and enabling exact calculations for short sequences.

Findings

01

Provides a combinatorial method for clump statistics

02

Enables exact calculations for short sequences

03

Complements probabilistic asymptotic results

Abstract

We consider a component of the word statistics known as clump; starting from a finite set of words, clumps are maximal overlapping sets of these occurrences. This parameter has first been studied by Schbath with the aim of counting the number of occurrences of words in random texts. Later work with similar probabilistic approach used the Chen-Stein approximation for a compound Poisson distribution, where the number of clumps follows a law close to Poisson. Presently there is no combinatorial counterpart to this approach, and we fill the gap here. We emphasize the fact that, in contrast with the probabilistic approach which only provides asymptotic results, the combinatorial approach provides exact results that are useful when considering short sequences.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topicssemigroups and automata theory · Algorithms and Data Compression · Advanced Combinatorial Mathematics