Optimal compression of hash-origin prefix trees
Jarek Duda

TL;DR
This paper analyzes the informational limits of hash-origin prefix trees, proposing optimal compression methods that significantly reduce memory usage compared to standard approaches and Bloom filters.
Contribution
It introduces the asymptotic minimal bits per element for prefix trees and relates this to optimal encoding of large unordered numbers, improving memory efficiency.
Findings
Minimal prefix tree requires about 2.77544 bits per element.
Cost of distinguishability can be reduced to about 2.33275 bits per element.
Memory requirements can be reduced to about 0.693 of Bloom filter size.
Abstract
There is a common problem of operating on hash values of elements of some database. In this paper there will be analyzed informational content of such general task and how to practically approach such found lower boundaries. Minimal prefix tree which distinguish elements turns out to require asymptotically only about 2.77544 bits per element, while standard approaches use a few times more. While being certain of working inside the database, the cost of distinguishability can be reduced further to about 2.33275 bits per elements. Increasing minimal depth of nodes to reduce probability of false positives leads to simple relation with average depth of such random tree, which is asymptotically larger by about 1.33275 bits than lg(n) of the perfect binary tree. This asymptotic case can be also seen as a way to optimally encode n large unordered numbers - saving lg(n!) bits of information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Caching and Content Delivery · DNA and Biological Computing
