Automatic Malware Description via Attribute Tagging and Similarity Embedding
Felipe N. Ducau, Ethan M. Rudd, Tad M. Heppner, Alex Long, and, Konstantin Berlin

TL;DR
This paper introduces a deep learning model that generates human-interpretable semantic descriptions of malware, improving detection, understanding, and relationship analysis between malicious samples.
Contribution
It presents a novel attribute tagging and similarity embedding approach that enhances malware characterization beyond traditional signatures and family labels.
Findings
Over 95% accuracy in generating correct malware tags
Achieves a 32-fold reduction in representation size compared to raw features
Effective in identifying malware family relationships
Abstract
With the rapid proliferation and increased sophistication of malicious software (malware), detection methods no longer rely only on manually generated signatures but have also incorporated more general approaches like machine learning detection. Although powerful for conviction of malicious artifacts, these methods do not produce any further information about the type of threat that has been detected neither allows for identifying relationships between malware samples. In this work, we address the information gap between machine learning and signature-based detection methods by learning a representation space for malware samples in which files with similar malicious behaviors appear close to each other. We do so by introducing a deep learning based tagging model trained to generate human-interpretable semantic descriptions of malicious software, which, at the same time provides…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Spam and Phishing Detection
