On the Abuse and Detection of Polyglot Files
Luke Koch, Sean Oesch, Amul Chaulagain, Jared Dixon, Matthew Dixon,, Mike Huettal, Amir Sadovnik, Cory Watson, Brian Weber, Jacob Hartman, Richard, Patulski

TL;DR
This paper investigates the security risks of polyglot files, reveals their misuse in cyber attacks, and introduces machine learning tools for effective detection and sanitization to improve cybersecurity defenses.
Contribution
It provides the first comprehensive survey of polyglot usage in the wild, creates a novel dataset, and develops PolyConv and ImSan tools for detection and sanitization.
Findings
PolyConv achieves 99.20% F1 score in detection.
PolySan successfully sanitizes 100% of tested image polyglots.
Polyglot files are exploited in real-world cyber attack chains.
Abstract
A polyglot is a file that is valid in two or more formats. Polyglot files pose a problem for malware detection systems that route files to format-specific detectors/signatures, as well as file upload and sanitization tools. In this work we found that existing file-format and embedded-file detection tools, even those developed specifically for polyglot files, fail to reliably detect polyglot files used in the wild, leaving organizations vulnerable to attack. To address this issue, we studied the use of polyglot files by malicious actors in the wild, finding polyglot samples and attack chains that leveraged polyglot files. In this report, we highlight two well-known APTs whose cyber attack chains relied on polyglot files to bypass detection mechanisms. Using knowledge from our survey of polyglot usage in the wild -- the first of its kind -- we created a novel data set based on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Digital and Cyber Forensics · Advanced Malware Detection Techniques
MethodsSparse Evolutionary Training · Polynomial Convolution
