A Unified Evaluation of Learning-Based Similarity Techniques for Malware Detection

Udbhav Prasad; Aniesh Chawla

arXiv:2602.15376·cs.CR·February 18, 2026

A Unified Evaluation of Learning-Based Similarity Techniques for Malware Detection

Udbhav Prasad, Aniesh Chawla

PDF

Open Access

TL;DR

This paper systematically compares various learning-based similarity techniques for malware detection using a unified framework, revealing that combining different methods yields better results than relying on a single approach.

Contribution

It provides the first reproducible benchmark of diverse learning-based similarity methods for malware detection under a unified evaluation framework.

Findings

01

No single technique outperforms others across all metrics.

02

Different methods exhibit distinct strengths and trade-offs.

03

Combining multiple techniques improves malware detection effectiveness.

Abstract

Cryptographic digests (e.g., MD5, SHA-256) are designed to provide exact identity. Any single-bit change in the input produces a completely different hash, which is ideal for integrity verification but limits their usefulness in many real-world tasks like threat hunting, malware analysis and digital forensics, where adversaries routinely introduce minor transformations. Similarity-based techniques address this limitation by enabling approximate matching, allowing related byte sequences to produce measurably similar fingerprints. Modern enterprises manage tens of thousands of endpoints with billions of files, making the effectiveness and scalability of the proposed techniques more important than ever in security applications. Security researchers have proposed a range of approaches, including similarity digests and locality-sensitive hashes (e.g., ssdeep, sdhash, TLSH), as well as more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Digital and Cyber Forensics · Network Security and Intrusion Detection