Towards More Realistic Probabilistic Models for Data Structures: The External Path Length in Tries under the Markov Model
Kevin Leckey, Ralph Neininger, Wojciech Szpankowski

TL;DR
This paper develops a new probabilistic model for analyzing the external path length of tries generated by Markov sources, providing a central limit theorem that enhances understanding of trie performance in realistic scenarios.
Contribution
It introduces a novel combination of the contraction method and analytic techniques to analyze trie parameters under Markov models, including the Lempel-Ziv'77 code.
Findings
Proves a central limit theorem for trie external path length under Markov sources.
Applies the results to the Lempel-Ziv'77 compression scheme.
Provides a framework for analyzing other trie parameters and data structures.
Abstract
Tries are among the most versatile and widely used data structures on words. They are pertinent to the (internal) structure of (stored) words and several splitting procedures used in diverse contexts ranging from document taxonomy to IP addresses lookup, from data compression (i.e., Lempel-Ziv'77 scheme) to dynamic hashing, from partial-match queries to speech recognition, from leader election algorithms to distributed hashing tables and graph compression. While the performance of tries under a realistic probabilistic model is of significant importance, its analysis, even for simplest memoryless sources, has proved difficult. Rigorous findings about inherently complex parameters were rarely analyzed (with a few notable exceptions) under more realistic models of string generations. In this paper we meet these challenges: By a novel use of the contraction method combined with analytic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Network Packet Processing and Optimization · Natural Language Processing Techniques
