EnCoD: Distinguishing Compressed and Encrypted File Fragments
Fabio De Gaspari, Dorjan Hitaj, Giulio Pagnotta, Lorenzo De Carli,, Luigi V. Mancini

TL;DR
This paper introduces EnCoD, a machine learning classifier that reliably distinguishes between compressed and encrypted file fragments, overcoming the limitations of entropy-based methods especially for small fragment sizes and diverse data types.
Contribution
The paper presents EnCoD, a novel learning-based approach that outperforms existing statistical tests in distinguishing compressed from encrypted data across various fragment sizes and data types.
Findings
EnCoD outperforms state-of-the-art methods in most scenarios.
Current entropy-based approaches are unreliable for small fragments.
EnCoD effectively distinguishes data types starting from 512-byte fragments.
Abstract
Reliable identification of encrypted file fragments is a requirement for several security applications, including ransomware detection, digital forensics, and traffic analysis. A popular approach consists of estimating high entropy as a proxy for randomness. However, many modern content types (e.g. office documents, media files, etc.) are highly compressed for storage and transmission efficiency. Compression algorithms also output high-entropy data, thus reducing the accuracy of entropy-based encryption detectors. Over the years, a variety of approaches have been proposed to distinguish encrypted file fragments from high-entropy compressed fragments. However, these approaches are typically only evaluated over a few, select data types and fragment sizes, which makes a fair assessment of their practical applicability impossible. This paper aims to close this gap by comparing existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
