New Entropy Measures for Tries with Applications to the XBWT
Lorenzo Carfagna, Carlo Tosoni

TL;DR
This paper introduces two new entropy measures for tries that better capture their compressibility, and demonstrates how to efficiently compress and index tries using these measures, improving upon previous methods.
Contribution
The paper proposes novel entropy measures for tries that incorporate symbol frequencies and topology, extending string entropy concepts to tries, and develops an efficient compression and indexing method based on these measures.
Findings
New entropy measures for tries that consider symbol frequencies and topology.
Trie compression using XBWT within the proposed entropy bounds.
The new encoding is always smaller than previous methods, sometimes asymptotically.
Abstract
Entropy quantifies the number of bits required to store objects under certain given assumptions. While this is a well established concept for strings, in the context of tries the state-of-the-art regarding entropies is less developed. The standard trie worst-case entropy considers the set of tries with a fixed number of nodes and alphabet size. However, this approach does not consider the frequencies of the symbols in the trie, thus failing to capture the compressibility of tries with skewed character distributions. On the other hand, the label entropy [FOCS '05], proposed for node-labeled trees, does not take into account the tree topology, which has to be stored separately. In this paper, we introduce two new entropy measures for tries - worst-case and empirical - which overcome the two aforementioned limitations. Notably, our entropies satisfy similar properties of their string…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Data Quality and Management · Computability, Logic, AI Algorithms
