Hierarchical Bloom Filter Trees for Approximate Matching

David Lillis; Frank Breitinger; Mark Scanlon

arXiv:1712.04544·cs.CR·November 15, 2022

Hierarchical Bloom Filter Trees for Approximate Matching

David Lillis, Frank Breitinger, Mark Scanlon

PDF

1 Repo

TL;DR

This paper introduces Hierarchical Bloom Filter Trees (HBFTs) to improve the scalability and speed of approximate bytewise matching in digital forensics, enabling faster collection-to-collection searches without sacrificing accuracy.

Contribution

The paper presents a novel HBFT data structure that significantly reduces pairwise comparisons in approximate matching, enhancing efficiency for large forensic datasets.

Findings

01

HBFT reduces matching time substantially

02

Maintains high accuracy in approximate matching

03

Effective in large-scale forensic data searches

Abstract

Bytewise approximate matching algorithms have in recent years shown significant promise in de- tecting files that are similar at the byte level. This is very useful for digital forensic investigators, who are regularly faced with the problem of searching through a seized device for pertinent data. A common scenario is where an investigator is in possession of a collection of "known-illegal" files (e.g. a collection of child abuse material) and wishes to find whether copies of these are stored on the seized device. Approximate matching addresses shortcomings in traditional hashing, which can only find identical files, by also being able to deal with cases of merged files, embedded files, partial files, or if a file has been changed in any way. Most approximate matching algorithms work by comparing pairs of files, which is not a scalable approach when faced with large corpora. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ishnid/mrsh-hbft
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.