# Multiple Instance Learning for Malware Classification

**Authors:** Jan Stiborek, Tom\'a\v{s} Pevn\'y, Martin Reh\'ak

arXiv: 1705.02268 · 2017-05-08

## TL;DR

This paper presents a novel multiple instance learning approach for malware classification that models system resource interactions and error messages, effectively reducing evasion and improving detection with fewer training samples.

## Contribution

It introduces a similarity-based, data-driven feature extraction method that enhances malware detection by leveraging system resource interactions and clustering, outperforming existing methods.

## Key findings

- Achieves superior classification accuracy with fewer training samples.
- Effectively reduces malware evasion by using diverse information sources.
- Outperforms state-of-the-art methods on a large binary corpus.

## Abstract

This work addresses classification of unknown binaries executed in sandbox by modeling their interaction with system resources (files, mutexes, registry keys and communication with servers over the network) and error messages provided by the operating system, using vocabulary-based method from the multiple instance learning paradigm. It introduces similarities suitable for individual resource types that combined with an approximative clustering method efficiently group the system resources and define features directly from data. This approach effectively removes randomization often employed by malware authors and projects samples into low-dimensional feature space suitable for common classifiers. An extensive comparison to the state of the art on a large corpus of binaries demonstrates that the proposed solution achieves superior results using only a fraction of training samples. Moreover, it makes use of a source of information different than most of the prior art, which increases the diversity of tools detecting the malware, hence making detection evasion more difficult.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.02268/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1705.02268/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/1705.02268/full.md

---
Source: https://tomesphere.com/paper/1705.02268