# FiFTy: Large-scale File Fragment Type Identification using Neural   Networks

**Authors:** Govind Mittal, Pawel Korus, Nasir Memon

arXiv: 1908.06148 · 2020-06-09

## TL;DR

FiFTy is a neural network-based tool for large-scale file type identification that outperforms previous methods in speed and accuracy, using a novel dataset with 75 diverse file types.

## Contribution

The paper introduces FiFTy, a neural network approach that eliminates explicit feature extraction for file type identification, achieving higher accuracy and speed.

## Key findings

- Average accuracy of 77.5% on 75 file types
- Processing speed of approximately 38 sec/GB
- Outperforms the previous state-of-the-art by over an order of magnitude

## Abstract

We present FiFTy, a modern file type identification tool for memory forensics and data carving. In contrast to previous approaches based on hand-crafted features, we design a compact neural network architecture, which uses a trainable embedding space, akin to successful natural language processing models. Our approach dispenses with explicit feature extraction which is a bottleneck in legacy systems. We evaluate the proposed method on a novel dataset with 75 file types - the most diverse and balanced dataset reported to date. FiFTy consistently outperforms all baselines in terms of speed, accuracy and individual misclassification rates. We achieved an average accuracy of 77.5% with processing speed of approx 38 sec/GB, which is better and more than an order of magnitude faster than the previous state-of-the-art tool - Sceadan (69% at 9 min/GB). Our tool and the corresponding dataset are available publicly online.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.06148/full.md

## Figures

20 figures with captions in the complete paper: https://tomesphere.com/paper/1908.06148/full.md

## References

35 references — full list in the complete paper: https://tomesphere.com/paper/1908.06148/full.md

---
Source: https://tomesphere.com/paper/1908.06148