Hamming Distributions of Popular Perceptual Hashing Techniques
Sean McKeown, William J Buchanan

TL;DR
This paper evaluates the effectiveness of various perceptual hashing algorithms in matching images despite modifications, analyzing their Hamming distance distributions on a large-scale dataset.
Contribution
It provides a comprehensive large-scale analysis of perceptual hash algorithms' robustness against content-preserving modifications.
Findings
Hamming distance distributions vary significantly across algorithms.
Some algorithms show better robustness to certain modifications.
The study highlights limitations and strengths of different perceptual hashes.
Abstract
Content-based file matching has been widely deployed for decades, largely for the detection of sources of copyright infringement, extremist materials, and abusive sexual media. Perceptual hashes, such as Microsoft's PhotoDNA, are one automated mechanism for facilitating detection, allowing for machines to approximately match visual features of an image or video in a robust manner. However, there does not appear to be much public evaluation of such approaches, particularly when it comes to how effective they are against content-preserving modifications to media files. In this paper, we present a million-image scale evaluation of several perceptual hashing archetypes for popular algorithms (including Facebook's PDQ, Apple's Neuralhash, and the popular pHash library) against seven image variants. The focal point is the distribution of Hamming distance scores between both unrelated images…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Advanced Image and Video Retrieval Techniques · Advanced Steganography and Watermarking Techniques
