Reliable Detection of Compressed and Encrypted Data
Fabio De Gaspari, Dorjan Hitaj, Giulio Pagnotta, Lorenzo De Carli,, Luigi V. Mancini

TL;DR
This paper introduces EnCoD, a learning-based classifier that reliably distinguishes between compressed and encrypted data fragments across various file types and sizes, outperforming existing statistical methods.
Contribution
The paper presents EnCoD, a novel machine learning approach that improves detection accuracy and identifies data formats, addressing limitations of current statistical techniques.
Findings
EnCoD achieves 82-92% accuracy across fragment sizes.
Current statistical methods fail on compressed data.
EnCoD accurately identifies data formats.
Abstract
Several cybersecurity domains, such as ransomware detection, forensics and data analysis, require methods to reliably identify encrypted data fragments. Typically, current approaches employ statistics derived from byte-level distribution, such as entropy estimation, to identify encrypted fragments. However, modern content types use compression techniques which alter data distribution pushing it closer to the uniform distribution. The result is that current approaches exhibit unreliable encryption detection performance when compressed data appears in the dataset. Furthermore, proposed approaches are typically evaluated over few data types and fragment sizes, making it hard to assess their practical applicability. This paper compares existing statistical tests on a large, standardized dataset and shows that current approaches consistently fail to distinguish encrypted and compressed data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Digital Media Forensic Detection · Digital and Cyber Forensics
