A perceptual hash function to store and retrieve large scale DNA sequences
Jocelyn De Goer De Herve, Myoung-Ah Kang, Xavier Bailly, Engelbert, Mephu Nguifo

TL;DR
This paper introduces a novel perceptual hash function based on DCT-SO for efficient storage and retrieval of large-scale DNA sequences, enabling similarity comparison without cryptographic hash limitations.
Contribution
The paper adapts perceptual hashing techniques, specifically DCT-SO, for DNA sequences, providing a data-efficient method for similarity-based retrieval of massive genomic data.
Findings
Effective data reduction achieved
Hash comparison via Hamming Distance is viable
Method successfully retrieves similar DNA sequences
Abstract
This paper proposes a novel approach for storing and retrieving massive DNA sequences.. The method is based on a perceptual hash function, commonly used to determine the similarity between digital images, that we adapted for DNA sequences. Perceptual hash function presented here is based on a Discrete Cosine Transform Sign Only (DCT-SO). Each nucleotide is encoded as a fixed gray level intensity pixel and the hash is calculated from its significant frequency characteristics. This results to a drastic data reduction between the sequence and the perceptual hash. Unlike cryptographic hash functions, perceptual hashes are not affected by "avalanche effect" and thus can be compared. The similarity distance between two hashes is estimated with the Hamming Distance, which is used to retrieve DNA sequences. Experiments that we conducted show that our approach is relevant for storing massive DNA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Fractal and DNA sequence analysis · Cell Image Analysis Techniques
