Learning from Context: Exploiting and Interpreting File Path Information   for Better Malware Detection

Adarsh Kyadige; Ethan M. Rudd; Konstantin Berlin

arXiv:1905.06987·cs.CR·May 20, 2019·5 cites

Learning from Context: Exploiting and Interpreting File Path Information for Better Malware Detection

Adarsh Kyadige, Ethan M. Rudd, Konstantin Berlin

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multi-view neural network that leverages file path context alongside PE file content to improve malware detection accuracy, demonstrating significant false negative rate reductions on a large real-world dataset.

Contribution

The study proposes a novel multi-view neural network model that incorporates file path information into static malware detection, enhancing detection performance with minimal additional infrastructure.

Findings

01

32.3% reduction in false negatives at 10^-3 FPR

02

33.1% reduction in false negatives at 10^-4 FPR

03

Model learns meaningful file path features and artifacts

Abstract

Machine learning (ML) used for static portable executable (PE) malware detection typically employs per-file numerical feature vector representations as input with one or more target labels during training. However, there is much orthogonal information that can be gleaned from the \textit{context} in which the file was seen. In this paper, we propose utilizing a static source of contextual information -- the path of the PE file -- as an auxiliary input to the classifier. While file paths are not malicious or benign in and of themselves, they do provide valuable context for a malicious/benign determination. Unlike dynamic contextual information, file paths are available with little overhead and can seamlessly be integrated into a multi-view static ML detector, yielding higher detection rates at very high throughput with minimal infrastructural changes. Here we propose a multi-view neural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dtrizna/quo.vadis
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Anomaly Detection Techniques and Applications

MethodsInterpretability · Local Interpretable Model-Agnostic Explanations