Adversarial Networks and Machine Learning for File Classification
Ken St. Germain, Josh Angichiodo

TL;DR
This paper presents a semi-supervised adversarial neural network that accurately classifies file types even when file headers or extensions are concealed, outperforming traditional methods especially with limited labeled data.
Contribution
The authors introduce a novel semi-supervised generative adversarial network for file classification, demonstrating superior accuracy and robustness over existing models in obfuscated scenarios.
Findings
Achieved 97.6% accuracy across 11 file types
Outperformed traditional neural networks and other machine learning algorithms
Effective in scenarios with limited labeled data
Abstract
Correctly identifying the type of file under examination is a critical part of a forensic investigation. The file type alone suggests the embedded content, such as a picture, video, manuscript, spreadsheet, etc. In cases where a system owner might desire to keep their files inaccessible or file type concealed, we propose using an adversarially-trained machine learning neural network to determine a file's true type even if the extension or file header is obfuscated to complicate its discovery. Our semi-supervised generative adversarial network (SGAN) achieved 97.6% accuracy in classifying files across 11 different types. We also compared our network against a traditional standalone neural network and three other machine learning algorithms. The adversarially-trained network proved to be the most precise file classifier especially in scenarios with few supervised samples available. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Digital and Cyber Forensics · Advanced Malware Detection Techniques
