Average Size of a Suffix Tree for Markov Sources

Philippe Jacquet; Wojciech Szpankowski

arXiv:1605.02123·cs.DS·May 10, 2016·1 cites

Average Size of a Suffix Tree for Markov Sources

Philippe Jacquet, Wojciech Szpankowski

PDF

Open Access

TL;DR

This paper analyzes the average size of suffix trees generated from Markov sources, revealing an asymptotic equivalence to tries built from independent sequences, and provides a formula for trie size under Markov models.

Contribution

It extends the understanding of suffix tree size from memoryless to Markov sources and derives a new formula for trie size in this context.

Findings

01

Average suffix tree size asymptotically matches trie size for Markov sources

02

Derived a formula for trie size under Markovian models

03

Applied novel analytic combinatorics techniques to word patterns

Abstract

We study a suffix tree built from a sequence generated by a Markovian source. Such sources are more realistic probabilistic models for text generation, data compression, molecular applications, and so forth. We prove that the average size of such a suffix tree is asymptotically equivalent to the average size of a trie built over $n$ independent sequences from the same Markovian source. This equivalence is only known for memoryless sources. We then derive a formula for the size of a trie under Markovian model to complete the analysis for suffix trees. We accomplish our goal by applying some novel techniques of analytic combinatorics on words also known as analytic pattern matching.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · DNA and Biological Computing · semigroups and automata theory