Nested Multiple Instance Learning in Modelling of HTTP network traffic
Tomas Pevny, Marek Dedic

TL;DR
This paper introduces a nested multiple-instance learning approach to model complex structured network traffic data, improving malware detection accuracy and interpretability compared to prior methods.
Contribution
It presents a novel nested multiple-instance learning model that effectively captures structured data semantics for malware detection in network traffic.
Findings
Outperforms prior feature-based and CNN-based methods on unseen malware families
Provides interpretable feedback to security researchers
Achieves higher accuracy in domain generalization scenarios
Abstract
In many interesting cases, the application of machine learning is hindered by data having a complicated structure stimulated by a structured file-formats like JSONs, XMLs, or ProtoBuffers, which is non-trivial to convert to a vector / matrix. Moreover, since the structure frequently carries a semantic meaning, reflecting it in the machine learning model should improve the accuracy but more importantly it facilitates the explanation of decisions and the model. This paper demonstrates on the identification of infected computers in the computer network from their HTTP traffic, how to achieve this reflection using recent progress in multiple-instance learning. The proposed model is compared to complementary approaches from the prior art, the first relying on human-designed features and the second on automatically learned features through convolution neural networks. In a challenging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Advanced Malware Detection Techniques · Internet Traffic Analysis and Secure E-voting
