Hash in a Flash: Hash Tables for Solid State Devices
Tyler Clemons, S. M. Faisal, Shirish Tatikonda, Charu Aggarawl, and, Srinivasan Parthasarathy

TL;DR
This paper explores the design of hash tables optimized for flash storage devices, addressing the challenges of random writes and limited erasures to improve performance for algorithms like TF-IDF.
Contribution
It introduces a novel hash table design using two related hash functions to reduce random writes and improve efficiency on flash devices.
Findings
Two related hash functions can improve data placement and reduce random writes.
The proposed designs balance query, insert, and update performance.
Implementation with TF-IDF demonstrates trade-offs and benefits of the new approach.
Abstract
In recent years, information retrieval algorithms have taken center stage for extracting important data in ever larger datasets. Advances in hardware technology have lead to the increasingly wide spread use of flash storage devices. Such devices have clear benefits over traditional hard drives in terms of latency of access, bandwidth and random access capabilities particularly when reading data. There are however some interesting trade-offs to consider when leveraging the advanced features of such devices. On a relative scale writing to such devices can be expensive. This is because typical flash devices (NAND technology) are updated in blocks. A minor update to a given block requires the entire block to be erased, followed by a re-writing of the block. On the other hand, sequential writes can be two orders of magnitude faster than random writes. In addition, random writes are degrading…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Caching and Content Delivery · Algorithms and Data Compression
