Dynamic External Hashing: The Limit of Buffering
Zhewei Wei, Ke Yi, Qin Zhang

TL;DR
This paper investigates the fundamental tradeoff between query and insertion costs in external hash tables with internal memory buffers, establishing limits on how buffering can improve performance.
Contribution
It proves that for certain query costs, internal buffers do not significantly reduce insertion costs, answering an open question in external memory data structures.
Findings
High query accuracy limits insertion efficiency when buffering is constrained.
Buffers are ineffective for reducing insertion costs if query costs are very close to optimal.
The results clarify the inherent tradeoffs in external hash table design.
Abstract
Hash tables are one of the most fundamental data structures in computer science, in both theory and practice. They are especially useful in external memory, where their query performance approaches the ideal cost of just one disk access. Knuth gave an elegant analysis showing that with some simple collision resolution strategies such as linear probing or chaining, the expected average number of disk I/Os of a lookup is merely , where each I/O can read a disk block containing items. Inserting a new item into the hash table also costs I/Os, which is again almost the best one can do if the hash table is entirely stored on disk. However, this assumption is unrealistic since any algorithm operating on an external hash table must have some internal memory (at least blocks) to work with. The availability of a small internal memory buffer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Image and Video Retrieval Techniques · Advanced Steganography and Watermarking Techniques
