# Fuzzy Hashing as Perturbation-Consistent Adversarial Kernel Embedding

**Authors:** Ari Azarafrooz, John Brock

arXiv: 1812.07071 · 2018-12-19

## TL;DR

This paper introduces a novel minimax training framework for fuzzy hash functions that learns from data, outperforming traditional methods in malware file similarity detection, especially for perturbed files.

## Contribution

The paper presents a new perturbation-consistent minimax architecture for learning fuzzy hash functions that adapt to datasets and improve similarity measurement accuracy.

## Key findings

- Learned fuzzy hash functions outperform traditional ones.
- The approach generalizes well to file perturbations.
- Effective in malware analysis for similar file detection.

## Abstract

Measuring the similarity of two files is an important task in malware analysis, with fuzzy hash functions being a popular approach. Traditional fuzzy hash functions are data agnostic: they do not learn from a particular dataset how to determine similarity; their behavior is fixed across all datasets. In this paper, we demonstrate that fuzzy hash functions can be learned in a novel minimax training framework and that these learned fuzzy hash functions outperform traditional fuzzy hash functions at the file similarity task for Portable Executable files. In our approach, hash digests can be extracted from the kernel embeddings of two kernel networks, trained in a minimax framework, where the roles of players during training (i.e adversary versus generator) alternate along with the input data. We refer to this new minimax architecture as perturbation-consistent. The similarity score for a pair of files is the utility of the minimax game in equilibrium. Our experiments show that learned fuzzy hash functions generalize well, capable of determining that two files are similar even when one of those files was generated using insertion and deletion operations.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.07071/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1812.07071/full.md

---
Source: https://tomesphere.com/paper/1812.07071