Locality-Preserving Minimal Perfect Hashing of k-mers
Giulio Ermanno Pibiri, Yoshihiro Shibuya, and Antoine Limasset

TL;DR
This paper introduces a new locality-preserving minimal perfect hash function for k-mers that exploits their intrinsic relationships, resulting in smaller space and faster query times compared to existing methods.
Contribution
It presents a novel locality-preserving MPHF tailored for consecutive k-mers, reducing space and improving efficiency by leveraging their overlaps.
Findings
Space usage decreases as k increases
Functions are smaller and faster to query
Outperforms existing MPHFs in experiments
Abstract
Minimal perfect hashing is the problem of mapping a static set of distinct keys into the address space bijectively. It is well-known that bits are necessary to specify a minimal perfect hash function (MPHF) , when no additional knowledge of the input keys is to be used. However, it is often the case in practice that the input keys have intrinsic relationships that we can exploit to lower the bit complexity of . For example, consider a string and the set of all its distinct -mers as input keys: since two consecutive -mers share an overlap of symbols, it seems possible to beat the classic bits/key barrier in this case. Moreover, we would like to map consecutive -mers to consecutive addresses, as to also preserve as much as possible their relationship in the codomain. This is a useful feature in practice as it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Image and Video Retrieval Techniques · Caching and Content Delivery
