Combining AI and AM - Improving Approximate Matching through Transformer Networks
Frieder Uhlig, Lukas Struppek, Dominik Hintersdorf, Thomas G\"obel,, Harald Baier, Kristian Kersting

TL;DR
This paper introduces DLAM, a transformer-based AI approach that enhances approximate matching in digital forensics by improving detection accuracy and scalability for file fragments, surpassing traditional methods like TLSH and ssdeep.
Contribution
The paper presents DLAM, a novel transformer-based approximate matching algorithm that eliminates manual feature extraction and improves detection of small file fragments in digital forensics.
Findings
DLAM achieves higher accuracy than TLSH and ssdeep.
DLAM enables detection of small fragments in large datasets.
DLAM reduces manual feature extraction effort.
Abstract
Approximate matching (AM) is a concept in digital forensics to determine the similarity between digital artifacts. An important use case of AM is the reliable and efficient detection of case-relevant data structures on a blacklist, if only fragments of the original are available. For instance, if only a cluster of indexed malware is still present during the digital forensic investigation, the AM algorithm shall be able to assign the fragment to the blacklisted malware. However, traditional AM functions like TLSH and ssdeep fail to detect files based on their fragments if the presented piece is relatively small compared to the overall file size. A second well-known issue with traditional AM algorithms is the lack of scaling due to the ever-increasing lookup databases. We propose an improved matching algorithm based on transformer models from the field of natural language processing. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital and Cyber Forensics · Digital Media Forensic Detection · Advanced Malware Detection Techniques
MethodsAttention Model
